HDFS installation, configuration, and basic use

Source: Internet
Author: User

HDFS installation, configuration, and basic use

HDFS is a distributed file system. After installation, HDFS is similar to a local file system, but HDFS is a network file system, therefore, the access to this file system is different from the access to the local file system (the local file system is called based on the system, of course, a network file system like NFS can be accessed in the same way as a local file system, because NFS has been installed in the kernel, HDFS is just a service program at the application layer ). However, these commands seem similar to common shell commands.

First, we need to download a Hadoop package. hadoop is divided into two Compression Files: source code and compiled package. Download address: Compile:

Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0 $ ls

Bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share

These files are similar to the directory structure of other software installed in linux. All the configuration files are under/etc, and the executable files are placed under the bin directory, sbin stores some script files.

To start an hdfs separately to try the file storage function, you need to configure the following files:

1. Configure the etc/hadoop/hadoop-env.sh file and check that some export directory settings are executed here.

You need to configure the JAVA_HOME variable and set it to the java installation path.

By default, export JAVA_HOME =$ {JAVA_HOME} is used to view and configure the JAVA_HOME path of the system.

Bkjia @ bkjia-VirtualBox :~ /Workplace/hadoop/hadoop-2.6.0/etc/hadoop $ echo $ {JAVA_HOME}

/Home/bkjia/java/jdk1.7.0 _ 60

Of course, you can also add the HADOOP_HOME = hadoop installation directory here to access the hadoop root directory.

2, configure the etc/hadoop/core-site.xml file, from the name can be seen here is some of the core configuration items, hadoop configuration is to use key: value method, however, configuration files use xml, so the basic structure is like this:

<Configuration>
<Property>
<Name> key </name>
<Value> value </value>
</Property>
</Configuration>

The key to be configured here is hadoop. tmp. dir, which is the basic directory of the HDFS system. If it is not configured, It will be set to the/tmp directory, and the files in the/tmp directory are not permanent, therefore, problems may occur. In addition, if the namenode and datanode directories of hdfs are not configured, they will also be stored in this directory by default.

Configuration item fs. default. the name is set to the access address of the HDFS namenode, because namenode stores all the metadata information of the system, that is, the file system access portal, so this must be configured, hdfs: // hzfengyu.netease.com: 9000 is configured here. Make sure that the previous domain name can be recognized by the local machine. The configuration file is as follows:

<Configuration>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/data </value>
</Property>

<Property>
<Name> fs. default. name </name>
<Value> hdfs: // hzfengyu.netease.com: 9000 </value>
</Property>

</Configuration>

-------------------------------------- Split line --------------------------------------

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

-------------------------------------- Split line --------------------------------------

3, configure the etc/hadoop/hdfs-site.xml file, this file is the hdfs configuration file, need to configure the following items:
Dfs. replication: It can be seen from the naming that the number of copies of each block is configured here. For testing, the simple configuration is 1.
Dfs. namenode. name. dir: root directory file on which namenode depends
Dfs. datannode. data. dir: root directory file on which datanode depends
The complete configuration is as follows:

<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/name </value>
</Property>

<Property>
<Name> dfs. datannode. data. dir </name>
<Value>/home/hzfengyu/workplace/hadoop/hdfs/data </value>
</Property>
</Configuration>

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.