Hadoop 2.x detailed steps for building a pseudo-distributed environment, hadoop2.x

Source: Internet
Author: User

Hadoop 2.x detailed steps for building a pseudo-distributed environment, hadoop2.x

This article describes in detail the entire process of Hadoop 2.x pseudo-distributed environment construction by combining text and text for your reference. The specific content is as follows:

1. Modify hadoop-env.sh, yarn-env.sh, mapred-env.sh

Method:Use notepad ++ (beifeng user) to open these three files

Add code:Export JAVA_HOME =/opt/modules/jdk1.7.0 _ 67

2. ModifyCore-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xmlConfiguration File

1) Modify core-site.xml

<configuration>  <property>    <name>fs.defaultFS</name>    <value>hdfs://Hadoop-senior02.beifeng.com:8020</value>  </property>  <property>    <name>hadoop.tmp.dir</name>    <value>/opt/modules/hadoop-2.5.0/data</value>  </property></configuration>

2) Modify hdfs-site.xml

<configuration>  <property>    <name>dfs.replication</name>    <value>1</value>  </property>  <property>    <name>dfs.namenode.http-address</name>    <value>Hadoop-senior02.beifeng.com:50070</value>  </property></configuration>

3) Modify yarn-site.xml

<configuration>  <property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value>  </property>  <property>    <name>yarn.resourcemanager.hostname</name>    <value>Hadoop-senior02.beifeng.com</value>  </property>  <property>    <name>yarn.log-aggregation-enable</name>    <value>true</value>  </property>  <property>    <name>yarn.log-aggregation.retain-seconds</name>    <value>86400</value>  </property></configuration>

4) Modify mapred-site.xml

<configuration>  <property>    <name>mapreduce.framework.name</name>    <value>yarn</value>  </property>  <property>    <name>mapreduce.jobhistory.webapp.address</name>    <value>0.0.0.0:19888</value>  </property></configuration>

3. Start hdfs

1) format namenode: $ bin/hdfs namenode-format

(2) start namenode: $ sbin/hadoop-daemon.sh start namenode

3) start datanode: $ sbin/hadoop-daemon.sh start datanode

4) hdfs monitoring web page: http://hadoop-senior02.beifeng.com: 50070

4. Start yarn

1) start resourcemanager: $ sbin/yarn-daemon.sh start resourcemanager

2) start nodemanager: sbin/yarn-daemon.sh start nodemanager

3) yarn monitoring web page: http://hadoop-senior02.beifeng.com: 8088

5. Test the wordcount jar package.

1) Positioning path:/opt/modules/hadoop-2.5.0

2) code test: bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount/input/sort.txt/output6/

Running process:

16/05/08 06:39:13 INFO client. RMProxy: Connecting to ResourceManager at Hadoop-senior02.beifeng.com/192.168.241.130:8032
16/05/08 06:39:15 INFO input. FileInputFormat: Total input paths to process: 1
16/05/08 06:39:15 INFO mapreduce. JobSubmitter: number of splits: 1
16/05/08 06:39:15 INFO mapreduce. JobSubmitter: Submitting tokens for job: job_1462660542807_0001
16/05/08 06:39:16 INFO impl. YarnClientImpl: Submitted application application_1462660542807_0001
16/05/08 06:39:16 INFO mapreduce. Job: The url to track the job: http://Hadoop-senior02.beifeng.com: 8088/proxy/application_1462660542807_0001/
16/05/08 06:39:16 INFO mapreduce. Job: Running job: job_1462660542807_0001
16/05/08 06:39:36 INFO mapreduce. Job: Job job_1462660542807_0001 running in uber mode: false
16/05/08 06:39:36 INFO mapreduce. Job: map 0% reduce 0%
16/05/08 06:39:48 INFO mapreduce. Job: map 100% reduce 0%
16/05/08 06:40:04 INFO mapreduce. Job: map 100% reduce 100%
16/05/08 06:40:04 INFO mapreduce. Job: Job job_1462660542807_0001 completed successfully
16/05/08 06:40:04 INFO mapreduce. Job: Counters: 49

3) view the result: bin/hdfs dfs-text/output6/par *

Running result:

Hadoop 2
Jps 1
Mapreduce 2
Yarn 1

6. MapReduce history Server

1) start: sbin/mr-jobhistory-daemon.sh start historyserver

2) web ui interface: http://hadoop-senior02.beifeng.com: 19888

7. hdfs, yarn, and mapreduce Functions

1) hdfs:Distributed File Systems, highly fault tolerant file systems, are suitable for deployment on cheap machines.

HdfsIs a master-slave structure, divided into namenode and datanode, where namenode is the namespace, datanode is the storage space, datanode is stored in the form of data blocks, each data block 128 M

2) yarn:The general resource management system provides unified resource management and scheduling for upper-layer applications.

YarnIt can be divided into resourcemanager and nodemanager. resourcemanager is responsible for resource scheduling and allocation, and nodemanager is responsible for data processing and resources.

3) mapreduce:MapReduce is a computing model divided into Map and Reduce ).

MapAfter each row is processed, it appears as a key-Value Pair and is passed to reduce. reduce summarizes and counts the data transmitted by map.

The above is all the content of this article, hoping to help you learn.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.