Hadoop security-hftp
By default, hftp is enabled, allowing you to access and download files in a browser. In this way, you can read all files, leaving a security risk.
The test is as follows:
/User/hive/warehouse/cdntest. the owner of the parent directory selfreadonly of db/selfreadonly/hosts is zhouyang and the permission is 700. However, if bkjia Users enter the following address in the browser, they can
following version 1), based on hftp implementation of two HDFS cluster transfer data between[Email protected] ~]$ Hadoop distcp hftp://hadoop1:50070/weather/middleAs shown below There are three points to note:1, this command must be run on the target cluster, in order to achieve the compatibility of HDFs RPC version2, the HFTP address is determined by
Reprinted please indicate the source: http://blog.csdn.net/lastsweetop/article/details/9086695
The previous articles talked about single-threaded operations. To copy many files in parallel, hadoop provides a small tool, distcp. The most common usage is to copy files between two hadoop clusters, the help documentation is very detailed. I will not explain it here. There are no two clusters in the development environment. The same cluster is used for Demonstration: hadoop distcp HDFS: // namenode:
and overwrite -overwrite By default, if a file already exists with the same name as the destination of the copy, the files are skipped by default. You can specify to overwrite the file with the same name through the-overwrite option, or update the file with the same name with the-update option. For more usage of DISTCP, you can run the Hadoop distcp command without parameters to see its usage.6) Copy between different versions of HDFsIf the two cluster Hadoop version is inconsistent , you canno
copied destination, the files are skipped by default. You can use the-overwrite option to overwrite files of the same name, or use the-update option to update files of the same name.For more usage of distcp, run the "hadoop distcp" command without adding parameters to view its usage.
If the hadoop versions of the two clusters are inconsistent, HDFS identifiers cannot be used to copy files because the RPC systems of the two clusters are incompatible. You can use a read-only HTTP-based
Hadoop has an abstract file system concept, and HDFs is just one of those implementations. The Java abstract class Org.apache.hadoop.fs.FileSystem shows a file system for Hadoop and has several implementations, as shown in table 3-1.
File system
Rr.Scheme
Java RealNow (all inOrg.apache.hadoop)
Describe
Local
File
Fs. LocalFileSystem
For a client-side checksumThe local Area Connection disk usesFile system. For noThere is a checksum of this
Distcp is mainly used to copy data between hadoop clusters.
1. If the haboop version is the same, you can use the following format:
Hadoop distcp HDFS: //
2. If you copy data between different versions of hadoop clusters, you can use the following format:
Hadoop distcp-I hftp: //
Note: At this time, you need to run distcp on the target cluster.-I is a ignore error.
Note that hftp has not
flowchart:
You can manually set through-M, if for HDFs balance, it is best to set the maps more, the block spread out.
If the version is inconsistent between two clusters, using HDFS may cause an error because the RPC system is incompatible. Then you can use the HFTP protocol based on the HTTP protocol, but the destination address must also be hdfs, like this:
Hadoop distcp hftp://namenode:50070/user/h
I. Introduction of HDFS1. HDFs Full NameHadoop distributed filesystem,hadoop Distributed File system.Hadoop has an abstract file system concept, and Hadoop provides an abstract class Org.apache.hadoop.fs.filessystem,hdfs is an implementation of this abstract class. Others are:
File system
URI Programme
Java implementation (Org.apache.hadoop )
Local
File
Fs. LocalFileSystem
Hdfs
Hdfs
Hdfs. Distrbutedfilessystem
scalability of Hadoop.In Hadoop, Hadoop defines an abstract file system concept, Specifically, Hadoop defines a Java abstract class: Org.apache.hadoop.fs.FileSystm, an abstract class used to define a filesystem interface in Hadoop, as long as a file system implements this interface, it can be used as a file system supported by Hadoop. Here is the file system that currently implements the Hadoop abstract file class, as shown in the following table:
File system
URI Programme
scalability of Hadoop.In Hadoop, Hadoop defines an abstract file system concept, Specifically, Hadoop defines a Java abstract class: Org.apache.hadoop.fs.FileSystm, an abstract class used to define a filesystem interface in Hadoop, as long as a file system implements this interface, it can be used as a file system supported by Hadoop. Here is the file system that currently implements the Hadoop abstract file class, as shown in the following table:
File system
URI Programme
because of a mismatch in the RPC system. To correct this error, you can access it using the hftp of the skeleton http. Because the task is to be executed in the target cluster, the RPC version of HDFs needs to match, and the code that runs in HFTF mode is as follows:Hadoop distcp hftp://namenode1:50070/foo Hdfs://namenode2/barNote exhausted, to define the Namenode network interface in the URI of the access
following environment variables will be used by lftp:HOME ~ ExtensionSHELL in! Which shell is used to run the command?PAGER is used as the page name and is used in the more and zmore commands.Http_proxy, https_proxy as the initial value of http: proxy, hftp: proxy, https: proxyFtp_proxy as the initial ftp: proxy or hftp: proxy value (specified by the URL protocol in the environment variable)No_proxy as the
: // 192.168.80.11: 9000/user. hadoop/inMapreduce is also used for running discovery.Purpose:Copy files in parallel in the same file system and must be of the same version.If the version is inconsistent, HDFS may cause an error because RPC is incompatible. In this case, you can use the HTTP-based hftp protocol,However, the target is still HDFS:Hadoop distcp hftp: // namenode: 50070/user/hadoop/input HDFS: /
, hadoop defines the concept of an abstract file system. Specifically, hadoop defines a Java Abstract class: Org. apache. hadoop. FS. file1_m: This abstract class is used to define a file system interface in hadoop. As long as a file system implements this interface, it can be used as a file system supported by hadoop. The following table lists the file systems that currently implement the hadoop abstract File class:
File System
UriSolution
JavaImplementation(Org. Apache. h
Today, when solving a remote server backup problem, the LFTP knowledge is used. Tidy up as follows:LFTP features more powerful, compared to the original with FTP, a lot of convenience.1. Login:Lftp Ftp://[email protected]pwd:*****or open Ftp://[email protected]2. Basic operation (Turn)Lftp Usage IntroductionLftp is a powerful download tool that supports access to file protocols: FTP, FTPS, HTTP, https, hftp, fish. (where FTPs and HTTPS need to include
We used distcp on the CDH4 version of Hadoop to copy the data from the CDH5 version of Hadoop to Cdh4, which commands the following
Hadoop Distcp-update-skipcrccheck hftp://cdh5:50070/xxxx hdfs://cdh4/xxx
When the file is very general there is such an error,
2017-12-15 10:47:24,506 info execute. bulkloadhbase-caused By:java.io.IOException:Got EOF But currentpos = 2278825984 2017-12-15 10:47:24,506 info execute. Bulkloadhbase- at org.apache.hadoop.hd
Lftp is a powerful download tool that supports protocol access to files: FTP, SFTP,FTPS, HTTP, https, hftp, fish. (where FTPs and HTTPS need to include the OpenSSL library at compile time). Llftp interface very want a shell: has the command complement, the history record, allows the multiple backstage task execution and so on the function, uses very conveniently. It also has features such as bookmarks, queues, mirroring, breakpoint continuation, and m
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.