Hadoop Cluster Environment configuration

Last Update:2014-12-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The environment for this configuration is the Hadoop1.2.1 version, and Hadoop introduced the Hadoop2.0 version in 13, which was modified on the basis of the Hadoop1.0 release to improve the efficiency of Hadoop cluster task scheduling, resource allocation, and fault handling.

Hadoop2.0 on the basis of Hadoop1.0, the first to make a change to HDFs, in Hadoop1.0, HDFs system Namenode node only allow 1, of course, in the GFS thesis, the cluster shadow hidden a shadownode, as Namenode backup, should be Hadoop1.0 configured in the Secondarynamenode Bar, In Hadoop2.0, there can be multiple Namenode in the HDFS system, which are independent of each other, Datanode register messages to all Namenode, thus enhancing the system's level of scalability and the availability of the system.

Hadoop2.0 Another change is a change to the MapReduce runtime framework, in Hadoop1.0, jobclient to the Master server after the task, the task is divided into different tasks by Jobtracker, submitted to the Slaver server for calculation, Master The server's tasks include assignment of tasks, allocation of resources, tracking of task execution, and processing after failed task execution, all of which are concentrated on the master server, resulting in a single point of failure and increased probability of task allocation failure. In Hadoop2.0, through the improvement of resource management and task management, the master node only allocates resources and monitors the state of work, other such as work partition, task status detection and so on to slaver node, this is the latest yarn framework, the system structure as shown:

It can be seen that the master server is only responsible for running ResourceManager, responsible for the management of resources, the specific tasks are managed by Applicationmanager, which include task allocation, status tracking and error handling.

The Hadoop environment is mostly configured with several files: Core-site.xml,hdfs-site.xml,mapred-site.xml and Yarn-site.xml.

Core-site.xml inside the main configuration of the cluster's task submission address.

Hdfs-site.xml fill in the relevant configuration of the HDFS system, including the location of the directory files and data files.

Mapred-site.xml: Configures the local location of the Jobtracker port, the intermediate result of the map operation.

Yarn-site.xml: This is a special configuration file in the HADOOP2, is the configuration of yarn framework, specific configuration information on the Hadoop official website, but will not be filled in.

Configuring the HADOOP1 environment is pretty straightforward, but the content in the HADOOP2 configuration file changes significantly, configuring 2.6 but not successful.

Using HADOOP1 cluster, made some performance test, HADOOP1 cluster environment is a master,2 station slaver, each slaver is single core, 2G memory configuration, test program is the WordCount use case of Hadoop comes with.

Locally generated 134 trillion of files, a single file, through the byte-written word statistics program, the statistics of a file, time spent 17 seconds, in the cluster, a single file time is 1 minutes 44 seconds. From the results of a single file, Hadoop does not reflect the performance he should have, and later tested 10 such files, the local time 3 minutes 34 seconds, the cluster first 3 minutes 44 seconds, the performance has been quite close, and then consider the cluster two slaver configuration add up also no I a notebook configuration high , so the result can be accepted, after the first job executes, immediately executes the second time, the result is 2 minutes 50 seconds, this result likes:

Hadoop Cluster Environment configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Cluster Environment configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop Cluster Environment configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support