Practice of HDFSHttpFS

Source: Internet
Author: User
Tags http 2
HttpFS is a server that provides RESTHTTP interfaces and supports all HDFS file system operations (read and write). It interacts with each other through webhdfsRESTHTTPAPI. This function is provided by cloudera to the Apache main branch. HttpFS can be used to transmit data between different Hadoop versions (avoiding RPC version problems), for example, HadoopDistCP

HttpFS is a server that provides the rest http interface and supports all HDFS file system operations (read and write). It interacts with each other through the webhdfs rest http api. This function is provided by cloudera to the Apache main branch. HttpFS can be used to transmit data between different Hadoop versions (avoiding RPC version problems), for example, Hadoop DistCP

HttpFS is a server that provides the rest http interface and supports all HDFS file system operations (read and write). It interacts with each other through the webhdfs rest http api. This function is provided by cloudera to the Apache main branch.
HttpFS can be used to transmit data between different Hadoop versions (avoiding RPC version problems), for example, Hadoop DistCP.
HTTP can be used to access the HDFS cluster data in the firewall (HttpFS can be used as a gateway role and is the only system that can access the internal cluster data through the firewall ).
HttpFS can access data through HTTP tools (such as curl and wget) and HTTP packages in various programming languages (not limited to Java.
Webhdfs client file system implementation can use HDFS file system command line commands (such as hadoop dfs) and use HDFS Java API to access HttpFS.
The built-in security features of HttpFS support Hadoop pseudo authentication and http spnego Kerberos and Other plug-in-type (pluggable) authentication mechanisms. It also provides support for Hadoop proxy users.

How does HttpFS work?

HttpFS is a Java Web application running in Tomcat binding and independent of Hadoop NameNode.
HttpFS HTTP web-service API call is an http rest call mapped to HDFS file operation commands, for example:

$ Curl http: // httpfs-host: 14000/webhdfs/v1/user/foo/README.txt return the contents of the/user/foo/README.txt directory in HDFS. $ Curl http: // httpfs-host: 14000/webhdfs/v1/user/foo? Op = list returns the content of the/user/foo directory in HDFS in JSON format. $ Curl-x post http: // httpfs-host: 14000/webhdfs/v1/user/foo/bar? Op = mkdirs create the HDFS/foo. bar directory.
What is the difference between HttpFS and Hadoop HDFS proxy?

->HttpFS is inspired by Hadoop HDFS proxy
->HttpFS can be considered a full rewrite of the Hadoop HDFS proxy.
->Hadoop HDFS proxy provides a subset of file system operations (read only), while HttpFS provides all file system operations
->HttpFS uses a set of pure http rest APIs, so you can use the HTTP tool more directly.
->HttpFS provides Hadoop pseudo authentication, Kerberos SPNEGOS authentication, and Hadoop proxy users, while Hadoop HDFS proxy does not
OPERATIONS:

# Enable HttpFS (server: hdp02) in Cloudera Manager ). For more information about APIs, see the link below. # For pseudo authentication in HttpFS, it can be provided through & user. name. The following command gets the homedircurl of hive user" http://hdp02:14000/webhdfs/v1?op=gethomedirectory&user.name=hive "{" Path ":" \/user \/hive "} # create a/user/hive/tmp directory with the permission of 777 curl-I-X PUT" http://hdp02:14000/webhdfs/v1/user/hive/tmp?op=MKDIRS&user.name=hive "{" Boolean ": true} curl-I-X PUT" http://hdp02:14000/webhdfs/v1/user/hive/tmp?op=SETPERMISSION&permission=777&user.name=hive "# List files under/user/tmp curl" http://hdp02:14000/webhdfs/v1/user/hive?op=LISTSTATUS&user.name=hive "{" FileStatuses ": {" FileStatus ": [{" pathSuffix ":" tmp "," type ":" DIRECTORY "," length ": 0," owner ": "hive", "group": "hadoop", "permission": "777", "accessTime": 0, "modificationTime": 1419234576458, "blockSize": 0, "replication": 0}] }}# upload the file curl-I-X PUT-T/tmp/test.txt" http://hdp02:14000/webhdfs/v1/user/hive/tmp/test.txt?op=CREATE&data=true&user.name=hive "-H" Content-Type: application/octet-stream "HTTP/1.1 100 ContinueHTTP/1.1 201 Created ...... # DELETE the tmp directory curl-I-X DELETE" http://hdp02:14000/webhdfs/v1/user/hive/tmp?op=DELETE&recursive=true&user.name=hive "HTTP/1.1 200 OK... {" boolean ": true}

Related documents:
Hadoop HDFS over HTTP
WebHDFS REST API
Hadoop HDFS over HTTP 2.6.0-Server Setup
Hadoop HDFS over HTTP 2.6.0-Using HTTP Tools

Original article address: HDFS HttpFS operation practice. Thank you for sharing it with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.