Configure Hadoop2.0

Last Update:2018-06-11 Source: Internet

Author: User

Tags builtin shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I want to build a Hadoop environment to share yarn recently. Hadoop2.0 has changed a lot compared with the previous Hadoop0.1x version. After a long time, we finally finished the environment well. I set up a two-node cluster and only configured some required parameters to barely run the cluster. 1. core-site.xmlconfigurationpropertynamef

I want to build a Hadoop environment to share yarn recently. Hadoop 2.0 has changed a lot from Hadoop 0.1x. After a long time, the environment has finally been improved. I set up a two-node cluster and only configured some required parameters to barely run the cluster. 1. core-site.xml configurationpropertynamef

1. core-site.xml

 
  
   
    fs.defaultFS
   
   
    hdfs://10.232.42.91:19000/

2. mapred-site.xml

 
  
   
    mapreduce.framework.name
   
   
    yarn

3. yarn-site.xml

 
  
   
    yarn.resourcemanager.address
   
   
    hdfs://10.232.42.91:19001/
   
  
  
   
    yarn.resourcemanager.resource-tracker.address
   
   
    hdfs://10.232.42.91:19002/
   
  
  
   
    yarn.nodemanager.aux-services
   
   
    mapreduce.shuffle
   
  
  
   
    yarn.resourcemanager.scheduler.address
   
   
    10.232.42.91:8030

Set JAVA_HOME and HADOOP_HOME to. bashrc and run sbin/start-all.sh. Using jps, you can see that the processes running on the two nodes are as follows.

[master] jps31318 ResourceManager28981 DataNode11580 JobHistoryServer28858 NameNode29155 SecondaryNameNode31426 NodeManager11016 Jps[slave] jps12592 NodeManager11711 DataNode17699 Jps

The preceding JobHistoryServer needs to be started independently, through which detailed logs of each application can be seen. The startup command is as follows.

sbin/mr-jobhistory-daemon.sh start historyserver

Go to http: // 10.232.42.91: 8088/cluster to view the cluster Introduction Information. The slot-related data is no longer visible here.

Everything is ready. Place some text data to the hdfs: // 10.232.42.91: 19000/input directory and run wordcount to check the effect.

$ cd hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount hdfs://10.232.42.91:19000/input hdfs://10.232.42.91:19000/output13/03/07 21:08:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:08:26 INFO input.FileInputFormat: Total input paths to process : 313/03/07 21:08:26 INFO mapreduce.JobSubmitter: number of splits:313/03/07 21:08:26 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar13/03/07 21:08:26 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class13/03/07 21:08:26 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name13/03/07 21:08:26 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class13/03/07 21:08:26 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps13/03/07 21:08:26 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class13/03/07 21:08:26 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir13/03/07 21:08:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1362658309553_001913/03/07 21:08:26 INFO client.YarnClientImpl: Submitted application application_1362658309553_0019 to ResourceManager at /10.232.42.91:1900113/03/07 21:08:26 INFO mapreduce.Job: The url to track the job: http://search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0019/13/03/07 21:08:26 INFO mapreduce.Job: Running job: job_1362658309553_001913/03/07 21:08:33 INFO mapreduce.Job: Job job_1362658309553_0019 running in uber mode : false13/03/07 21:08:33 INFO mapreduce.Job:  map 0% reduce 0%13/03/07 21:08:39 INFO mapreduce.Job:  map 100% reduce 0%13/03/07 21:08:44 INFO mapreduce.Job:  map 100% reduce 100%13/03/07 21:08:44 INFO mapreduce.Job: Job job_1362658309553_0019 completed successfully13/03/07 21:08:44 INFO mapreduce.Job: Counters: 43File System CountersFILE: Number of bytes read=12698FILE: Number of bytes written=312593FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=16947HDFS: Number of bytes written=8739HDFS: Number of read operations=12HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=3Launched reduce tasks=1Rack-local map tasks=3Total time spent by all maps in occupied slots (ms)=10750Total time spent by all reduces in occupied slots (ms)=4221Map-Reduce FrameworkMap input records=317Map output records=2324Map output bytes=24586Map output materialized bytes=12710Input split bytes=316Combine input records=2324Combine output records=885Reduce input groups=828Reduce shuffle bytes=12710Reduce input records=885Reduce output records=828Spilled Records=1770Shuffled Maps =3Failed Shuffles=0Merged Map outputs=3GC time elapsed (ms)=376CPU time spent (ms)=4480Physical memory (bytes) snapshot=557428736Virtual memory (bytes) snapshot=2105122816Total committed heap usage (bytes)=254607360Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=16631File Output Format Counters Bytes Written=8739

Next, let's play with yarn. The WritingYarnApplications in the Hadoop official document is too broken. Fortunately, I learned that distributedshell was written using yarn. To study yarn, go directly to Hadoop source to find the corresponding code.

$ hadoop jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar --jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --shell_command uname --shell_args '-a'13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:42:44 INFO distributedshell.Client: Initializing Client13/03/07 21:42:44 INFO distributedshell.Client: Running Client13/03/07 21:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:42:44 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=213/03/07 21:42:44 INFO distributedshell.Client: Got Cluster node info from ASM13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search042091.sqa.cm4:39557, nodeAddresssearch042091.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663711950, 13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search041134.sqa.cm4:49313, nodeAddresssearch041134.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663712038, 13/03/07 21:42:44 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=17, queueChildQueueCount=013/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 102413/03/07 21:42:44 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 819213/03/07 21:42:44 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=102413/03/07 21:42:44 INFO distributedshell.Client: Setting up application submission context for ASM13/03/07 21:42:44 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment13/03/07 21:42:45 INFO distributedshell.Client: Set the environment for the application master13/03/07 21:42:45 INFO distributedshell.Client: Setting up app master command13/03/07 21:42:45 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx1024m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 1 --priority 0 --shell_command uname --shell_args -a --debug 1>
 
  /AppMaster.stdout 2>
  
   /AppMaster.stderr 13/03/07 21:42:45 INFO distributedshell.Client: Submitting application to ASM13/03/07 21:42:45 INFO client.YarnClientImpl: Submitted application application_1362658309553_0020 to ResourceManager at /10.232.42.91:1900113/03/07 21:42:46 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:47 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:48 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:49 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:50 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:51 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop13/03/07 21:42:52 INFO distributedshell.Client: Application completed successfully

After the operation is complete, the output cannot be found. After a great deal of effort, the output is finally found under hadoop/logs/userlogs. I don't know why two containers are running.

$ tree hadoop/logs/userlogs/application_1362658309553_0018application_1362658309553_0018|-- container_1362658309553_0018_01_000001|   |-- AppMaster.stderr|   `-- AppMaster.stdout`-- container_1362658309553_0018_01_000002    |-- stderr    `-- stdout$ cat hadoop/logs/userlogs/application_1362658309553_0018/container_1362658309553_0018_01_000002/stdoutLinux search042091.sqa.cm4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Well, start scheduling a program with yarn. I wrote a script to start the server.

$ cat ~/start_sp.sh #!/bin/env bashsource /home/admin/.bashrc/home/admin/sp/bin/sap_server -c /home/admin/sp/sp_worker/etc/sap_server_app.cfg -l /home/admin/sp/sp_worker/etc/sap_server_log.cfg -k restart

After the process is started, the process diagram is as follows.

Then I kill the script directly and expect yarn to restart the script for me. The AppMaster. stderr log contains the following content.

13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1362747551045_0017_01_000002, state=COMPLETE, exitStatus=137, diagnostics=Killed by external signal13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Current application state: loop=464, appDone=true, total=1, requested=1, completed=1, failed=1, currentAllocated=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM13/03/08 21:40:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.AMRMClientImpl is stopped.13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application Master failed. exiting

Original article address: Hadoop 2.0 configuration, thanks to the original author for sharing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More