HDFS installation, configuration, and basic use

Last Update:2015-01-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HDFS is a distributed file system. After installation, HDFS is similar to a local file system, but HDFS is a network file system, therefore, the access to this file system is different from the access to the local file system (the local file system is called based on the system, of course, a network file system like NFS can be accessed in the same way as a local file system, because NFS has been installed in the kernel, HDFS is just a service program at the application layer ). However, these commands seem similar to common shell commands.

First, we need to download a Hadoop package. hadoop is divided into two Compression Files: source code and compiled package. Download address: Compile:

Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0 $ ls

Bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share

These files are similar to the directory structure of other software installed in linux. All the configuration files are under/etc, and the executable files are placed under the bin directory, sbin stores some script files.

To start an hdfs separately to try the file storage function, you need to configure the following files:

1. Configure the etc/hadoop/hadoop-env.sh file and check that some export directory settings are executed here.

You need to configure the JAVA_HOME variable and set it to the java installation path.

By default, export JAVA_HOME =$ {JAVA_HOME} is used to view and configure the JAVA_HOME path of the system.

Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0/etc/hadoop $ echo $ {JAVA_HOME}

/Home/bkjia/java/jdk1.7.0 _ 60

Of course, you can also add the HADOOP_HOME = hadoop installation directory here to access the hadoop root directory.

2, configure the etc/hadoop/core-site.xml file, from the name can be seen here is some of the core configuration items, hadoop configuration is to use key: value method, however, configuration files use xml, so the basic structure is like this:

<Configuration>
<Property>
<Name> key </name>
<Value> value </value>
</Property>
</Configuration>

The key to be configured here is hadoop. tmp. dir, which is the basic directory of the HDFS system. If it is not configured, It will be set to the/tmp directory, and the files in the/tmp directory are not permanent, therefore, problems may occur. In addition, if the namenode and datanode directories of hdfs are not configured, they will also be stored in this directory by default.

Configuration item fs. default. the name is set to the access address of the HDFS namenode, because namenode stores all the metadata information of the system, that is, the file system access portal, so this must be configured, hdfs: // hzfengyu.netease.com: 9000 is configured here. Make sure that the previous domain name can be recognized by the local machine. The configuration file is as follows:

<Configuration>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/data </value>
</Property>

<Property>
<Name> fs. default. name </name>
<Value> hdfs: // hzfengyu.netease.com: 9000 </value>
</Property>

</Configuration>

-------------------------------------- Split line --------------------------------------

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

-------------------------------------- Split line --------------------------------------

3, configure the etc/hadoop/hdfs-site.xml file, this file is the hdfs configuration file, need to configure the following items:
Dfs. replication: It can be seen from the naming that the number of copies of each block is configured here. For testing, the simple configuration is 1.
Dfs. namenode. name. dir: root directory file on which namenode depends
Dfs. datannode. data. dir: root directory file on which datanode depends
The complete configuration is as follows:

<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/name </value>
</Property>

<Property>
<Name> dfs. datannode. data. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/data </value>
</Property>
</Configuration>

For more details, please continue to read the highlights on the next page:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFS installation, configuration, and basic use

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HDFS installation, configuration, and basic use

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support