Hadoop 2.x detailed steps for building a pseudo-distributed environment, hadoop2.x
This article describes in detail the entire process of Hadoop 2.x pseudo-distributed environment construction by combining text and text for your reference. The specific content is as follows:
1. Modify hadoop-env.sh, yarn-env.sh, mapred-env.sh
Method:Use notepad ++ (beifeng user) to open these three files
Add code:Export JAVA_HOME =/opt/modules/jdk1.7.0 _ 67
2. ModifyCore-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xmlConfiguration File
1) Modify core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Hadoop-senior02.beifeng.com:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/modules/hadoop-2.5.0/data</value> </property></configuration>
2) Modify hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.http-address</name> <value>Hadoop-senior02.beifeng.com:50070</value> </property></configuration>
3) Modify yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>Hadoop-senior02.beifeng.com</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property></configuration>
4) Modify mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> </property></configuration>
3. Start hdfs
1) format namenode: $ bin/hdfs namenode-format
(2) start namenode: $ sbin/hadoop-daemon.sh start namenode
3) start datanode: $ sbin/hadoop-daemon.sh start datanode
4) hdfs monitoring web page: http://hadoop-senior02.beifeng.com: 50070
4. Start yarn
1) start resourcemanager: $ sbin/yarn-daemon.sh start resourcemanager
2) start nodemanager: sbin/yarn-daemon.sh start nodemanager
3) yarn monitoring web page: http://hadoop-senior02.beifeng.com: 8088
5. Test the wordcount jar package.
1) Positioning path:/opt/modules/hadoop-2.5.0
2) code test: bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount/input/sort.txt/output6/
Running process:
16/05/08 06:39:13 INFO client. RMProxy: Connecting to ResourceManager at Hadoop-senior02.beifeng.com/192.168.241.130:8032
16/05/08 06:39:15 INFO input. FileInputFormat: Total input paths to process: 1
16/05/08 06:39:15 INFO mapreduce. JobSubmitter: number of splits: 1
16/05/08 06:39:15 INFO mapreduce. JobSubmitter: Submitting tokens for job: job_1462660542807_0001
16/05/08 06:39:16 INFO impl. YarnClientImpl: Submitted application application_1462660542807_0001
16/05/08 06:39:16 INFO mapreduce. Job: The url to track the job: http://Hadoop-senior02.beifeng.com: 8088/proxy/application_1462660542807_0001/
16/05/08 06:39:16 INFO mapreduce. Job: Running job: job_1462660542807_0001
16/05/08 06:39:36 INFO mapreduce. Job: Job job_1462660542807_0001 running in uber mode: false
16/05/08 06:39:36 INFO mapreduce. Job: map 0% reduce 0%
16/05/08 06:39:48 INFO mapreduce. Job: map 100% reduce 0%
16/05/08 06:40:04 INFO mapreduce. Job: map 100% reduce 100%
16/05/08 06:40:04 INFO mapreduce. Job: Job job_1462660542807_0001 completed successfully
16/05/08 06:40:04 INFO mapreduce. Job: Counters: 49
3) view the result: bin/hdfs dfs-text/output6/par *
Running result:
Hadoop 2
Jps 1
Mapreduce 2
Yarn 1
6. MapReduce history Server
1) start: sbin/mr-jobhistory-daemon.sh start historyserver
2) web ui interface: http://hadoop-senior02.beifeng.com: 19888
7. hdfs, yarn, and mapreduce Functions
1) hdfs:Distributed File Systems, highly fault tolerant file systems, are suitable for deployment on cheap machines.
HdfsIs a master-slave structure, divided into namenode and datanode, where namenode is the namespace, datanode is the storage space, datanode is stored in the form of data blocks, each data block 128 M
2) yarn:The general resource management system provides unified resource management and scheduling for upper-layer applications.
YarnIt can be divided into resourcemanager and nodemanager. resourcemanager is responsible for resource scheduling and allocation, and nodemanager is responsible for data processing and resources.
3) mapreduce:MapReduce is a computing model divided into Map and Reduce ).
MapAfter each row is processed, it appears as a key-Value Pair and is passed to reduce. reduce summarizes and counts the data transmitted by map.
The above is all the content of this article, hoping to help you learn.