HDFS installation, configuration, and basic use
HDFS is a distributed file system. After installation, HDFS is similar to a local file system, but HDFS is a network file system, therefore, the access to this file system is different from the access to the local file system (the local file system is called based on the system, of course, a network file system like NFS can be accessed in the same way as a local file system, because NFS has been installed in the kernel, HDFS is just a service program at the application layer ). However, these commands seem similar to common shell commands.
First, we need to download a Hadoop package. hadoop is divided into two Compression Files: source code and compiled package. Download address: Compile:
Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0 $ ls
Bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
These files are similar to the directory structure of other software installed in linux. All the configuration files are under/etc, and the executable files are placed under the bin directory, sbin stores some script files.
To start an hdfs separately to try the file storage function, you need to configure the following files:
1. Configure the etc/hadoop/hadoop-env.sh file and check that some export directory settings are executed here.
You need to configure the JAVA_HOME variable and set it to the java installation path.
By default, export JAVA_HOME =$ {JAVA_HOME} is used to view and configure the JAVA_HOME path of the system.
Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0/etc/hadoop $ echo $ {JAVA_HOME}
/Home/bkjia/java/jdk1.7.0 _ 60
Of course, you can also add the HADOOP_HOME = hadoop installation directory here to access the hadoop root directory.
2, configure the etc/hadoop/core-site.xml file, from the name can be seen here is some of the core configuration items, hadoop configuration is to use key: value method, however, configuration files use xml, so the basic structure is like this:
<Configuration>
<Property>
<Name> key </name>
<Value> value </value>
</Property>
</Configuration>
The key to be configured here is hadoop. tmp. dir, which is the basic directory of the HDFS system. If it is not configured, It will be set to the/tmp directory, and the files in the/tmp directory are not permanent, therefore, problems may occur. In addition, if the namenode and datanode directories of hdfs are not configured, they will also be stored in this directory by default.
Configuration item fs. default. the name is set to the access address of the HDFS namenode, because namenode stores all the metadata information of the system, that is, the file system access portal, so this must be configured, hdfs: // hzfengyu.netease.com: 9000 is configured here. Make sure that the previous domain name can be recognized by the local machine. The configuration file is as follows:
<Configuration>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/data </value>
</Property>
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // hzfengyu.netease.com: 9000 </value>
</Property>
</Configuration>
-------------------------------------- Split line --------------------------------------
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]
-------------------------------------- Split line --------------------------------------
3, configure the etc/hadoop/hdfs-site.xml file, this file is the hdfs configuration file, need to configure the following items:
Dfs. replication: It can be seen from the naming that the number of copies of each block is configured here. For testing, the simple configuration is 1.
Dfs. namenode. name. dir: root directory file on which namenode depends
Dfs. datannode. data. dir: root directory file on which datanode depends
The complete configuration is as follows:
<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/name </value>
</Property>
<Property>
<Name> dfs. datannode. data. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/data </value>
</Property>
</Configuration>
For more details, please continue to read the highlights on the next page: