Hadoop Distributed File System (HDFS)

Source: Internet
Author: User
Keywords Dfs xml name

1, Hadoop version of the introduction

The configuration files that were previously 0.20.2 (excluding this version) are in Default.xml.

0.20.x version does not contain the Eclipse plug-in jar package, because of the different versions of Eclipse, so you need to compile the source code to generate the corresponding plug-ins.

The 0.20.2--0.22.x version of the configuration file is focused on Conf/core-site.xml, Conf/hdfs-site.xml, and Conf/mapred-site.xml. In。

The 0.23.x version has added yarn technology, with configuration files focused on Conf/core-site.xml, Conf/hdfs-site.xml, Conf/yarn-site.xml and conf/ Mapred-site.xml. These 4 documents.

As the 0.23.x version of the change is relatively large, add new technology, so many Hadoop based on some plug-ins are difficult to be compatible, such as Hive, hbase, pig, etc. are based on the previous version of 0.23.x.

So Apache begins to unify the version number, which distinguishes the functionality of Hadoop from the version number.

0.22.x Direct upgrade to 1.0.0

0.23.x Direct upgrade to 2.0.0

This divides hadoop into two versions 1 and 2

1 version: Mainly based on the original technology upgrades and development, while supporting the support of other technologies. If you want to use HBase, hive and other techniques, you only have to choose version 1

2 version: mainly based on the promotion and development of new technologies, if only based on the development of Hadoop, this is a good choice.

Current official online download Hadoop Description:

Download

1.2.x-current stable version, 1.2 release

2.4.x-current Stable 2.x version

0.23.x-similar to 2.x.x but missing NN HA.

2. Hadoop Installation and Mode

At present, I am using hadoop-0.20.2 in the experimental environment, so I will be based on this version of the description.

The configuration of the various components of Hadoop is under folder Conf. Earlier Hadoop used a configuration file Hadoop-site.xml to configure the Common,hdfs and MapReduce components, starting with the 0.20.0 version, and dividing it into three files.

Core-site.xml: Configures the properties of the common component.

Hdfs-site.xml: Configure HDFS properties.

Mapred-sit.xml: Configure MapReduce properties.

2.1. Hadoop operation mode

Hadoop operates in the following three ways:

Standalone mode (standalone or local mode): Without any daemons (daemon), all programs are executed on a single JVM. Mainly used in the development phase. The default property is set for this mode, so no additional configuration is required.

Pseudo-distributed mode (pseudo-distributed model): The Hadoop daemon runs on the local machine, simulating a small cluster.

Fully distributed mode (full distributed model): The Hadoop daemon runs on a cluster.

Different mode-critical configuration Properties

Component Name Attribute name Independent mode pseudo distribution mode full distribution mode

Commonfs.default.http://www.aliyun.com/zixun/aggregation/11696.html ">namefile:///(default) hdfs://localhost : 9000hdfs://namenode:9000

Hdfsdfs.replicationn/a13 (default)

MapReducemapred.job.trackerlocal (default) localhost:9001jobtracker:9001

2.2. Native mode installation

Because the default properties are set specifically for this mode and do not need to run any daemons, this mode requires no action other than setting the Dfs.replication value to 1.

Test:

Go to the $hadoop_home directory and execute the following command to test the success of the installation

[Plain] View plaincopyprint?

$ mkdir Input

$ CP Conf/*.xml Input

$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z.] +'

$ cat output/*

Output:

1 dfsadmin

After the steps above, the installation succeeds if no errors occur.

2.3, pseudo-distributed mode installation steps

Installation steps:

1. Set environment variable (JAVA_HOME,PATH,HADOOP_HOME,CLASSPATH)

2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml)

3, set up SSH no password login

4. Format File system Hadoop Namenode-format

5, start the daemon process start-all.sh

6, Stop the daemon process

The second step is the example:

[HTML] View Plaincopyprint?

Fs.default.name

localhost:9000

Mapred.job.tracker

localhost:9001

Dfs.replication

1

Namenode and Jobtracker status can be viewed via web page after startup

namenode-http://localhost:50070/

jobtracker-http://localhost:50030/

Test:

Copying files to a distributed file system

[Plain] View plaincopyprint?

$ bin/hadoop fs-put conf input

Run Tests

[Plain] View plaincopyprint?

$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z.] +'

Get the results of the test program execution

[Plain] View plaincopyprint?

$ bin/hadoop Fs-cat output/*

Output:

[Plain] View plaincopyprint?

3 Dfs.class

2 Dfs.period

1 dfs.file

1 dfs.replication

1 dfs.servers

1 dfsadmin

1 Dfsmetrics.log

After the steps above, the installation succeeds if no errors occur.

2.4. Full distribution Mode installation steps

Installation steps:

1. Set environment variable (JAVA_HOME,PATH,HADOOP_HOME,CLASSPATH)

2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml,masters,slaves)

3, set up SSH no password login

4. Format File system Hadoop Namenode-format

5, start the daemon process start-all.sh

6, Stop the daemon process

Namenode and Jobtracker status can be viewed via web page after startup

namenode-http://namenode:50070/

jobtracker-http://jobtracker:50030/

Note:

Hadoop is installed in the same location on each machine, and the user name is the same.

3. Eclipse Plug-in Installation

The Eclipse Hadoop plug-in is designed to quickly develop mapreduce programs that provide

MapReduce location view for setting MapReduce variables;

Windows->preferences added a set of Hadoop installation location settings bar;

The DFS locations project is added to the project explore view to view the contents of the HDFs file system and to upload the download file;

New project adds MapReduce project;

Added the run on Hadoop platform feature.

Note that Hadoop Contrib\eclipse-plugin\hadoop-0.20.2-eclipse-plugin.jar is outdated and needs to download a new one from the Internet, otherwise there will be no response when running the MapReduce program.

  

Original link: http://blog.csdn.net/chaofanwei/article/details/40209527
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.