Java and Web program call hadoop2.6

Source: Internet
Author: User

1. Hadoop cluster:

1.1 System and hardware configuration:

Hadoop version: 2.6, three virtual machines: node101 (192.168.0.101),node102 (192.168.0.102),node103 (192.168.0.103), 2G memory per machine, 1 CPU cores;

Node101:nodemanager, NameNode, ResourceManager, DataNode;

Node102:nodemanager, DataNode, Secondarynamenode, Jobhistoryserver;

Node103:nodemanager, DataNode;

1.2 Problems encountered during configuration:

1) NodeManager can not start;

The virtual machine configured at the beginning is configured with 512M of memory, so the "YARN.NODEMANAGER.RESOURCE.MEMORY-MB" configuration in Yarn-site.xml is 512 (its default configuration is 1024), view logs, error:

Org.apache.hadoop.yarn.exceptions.YarnRuntimeException:Recieved SHUTDOWN signal from Resourcemanager, registration of NodeManager failed, Message from Resourcemanager:nodemanager from  node101 doesn ' t satisfy minimum allocations, Sen Ding SHUTDOWN signal to the NodeManager.
Change it to 1024 or above to start the NodeManager normally, I set the 2048;

2) The task can be submitted, but will not continue to run

A. Since each virtual machine here is configured with only one core, the "Yarn.nodemanager.resource.cpu-vcores" default configuration in Yarn-site.xml is 8, so there is a problem when allocating resources, so configure this parameter to 1;

B. The following error occurred:

Is running beyond virtual memory limits. Current usage:96.6 MB of 1.5 GB physical memory used; 1.6 GB of 1.5 GB virtual memory used. Killing container.
This should be map, reduce, NodeManager resource configuration is not configured, the size of the configuration is not correct, but I changed for a long time, the feeling should be no problem, but has been reported this mistake, finally no way, the inspection removed, that is, the yarn-site.xml in the " Yarn.nodemanager.vmem-check-enabled "is configured to false so that the task can be submitted.

1.3 Configuration file (want to have an expert can point out the resource configuration situation, can not appear above B error, instead of using the method of removing the check):

1) Configure JDK in hadoop-env.sh and yarn-env.sh, while Hadoop_heapsize and Yarn_heapsize are configured to 512;

2) hdfs-site.xml configuration data store path and Secondaryname node:

<configuration><property> <name>dfs.namenode.name.dir</name> <value>file:////data/ Hadoop/hdfs/name</value> <description>determines where on the local filesystem the DFS name node should  Store the name table (fsimage). If This was a comma-delimited list of directories then the name table was replicated in all of the directories, fo R redundancy. </description></property><property> <name>dfs.datanode.data.dir</name> <value >file:///data/hadoop/hdfs/data</value> <description>determines where on the local filesystem a DFS data  node should store its blocks. If This was a comma-delimited list of directories, then data would be stored in all named directories, typically on differ  ENT devices.  Directories that does not exist is ignored. </description></property><property><name>dfs.namenode.secondary.http-address</name ><value>node102:50090</value&gT;</property></configuration> 
3) core-site.xml Configuration Namenode:

<configuration><property><name>fs.defaultFS</name>  <value>hdfs://node101 :8020</value></property></configuration>
4) Mapred-site.xml Configure the resources for map and reduce:

<configuration><property> <name>mapreduce.framework.name</name> <value>yarn</  Value> <description>the Runtime framework for executing MapReduce jobs.  Can be one of the local, classic or yarn. </description></property><!--jobhistory Properties--><property> <name> Mapreduce.jobhistory.address</name> <value>node102:10020</value> <description>mapreduce Jobhistory Server IPC host:port</description></property><property><name> Mapreduce.map.memory.mb</name><value>1024</value></property><property><name >mapreduce.reduce.memory.mb</name><value>1024</value></property><property> <name>mapreduce.map.java.opts</name><value>-Xmx512m</value></property>< property><name>mapreduce.reduce.java.opts</name><value>-xmx512m</value></ property></configuration&Gt 
5) Yarn-site.xml configuration ResourceManager and related resources:

<configuration> <property> <description>the hostname of the rm.</description> <name>ya    rn.resourcemanager.hostname</name> <value>node101</value> </property> <property> <description>the address of the Applications manager interface in the rm.</description> <name>yarn .resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </ property> <property> <description>the Address of the scheduler interface.</description> <na Me>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030 </value> </property> <property> <description>the http address of the RM Web application.</ Description> <name>yarn.resourcemanager.webapp.address</name> <value>${ Yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>the HTTPS adddress of the RM Web application.</description> <name>y Arn.resourcemanager.webapp.https.address</name> <value>${yarn.resourcemanager.hostname}:8090</ value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name > <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <descr Iption>the address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address& lt;/name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> & Lt;description>list of directories to store localized files in. An application ' s localized file directory would be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/ap      Pcache/application_${appid}. Individual containers ' work directories, called Container_${contID}, would be is subdirectories of this. </description> <name>yarn.nodemanager.local-dirs</name> <value>/data/hadoop/yarn/local </value> </property> <property> <description>whether to enable log Aggregation</descriptio n> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <p Roperty> <description>where to aggregate logs to.</description> <name>yarn.nodemanager.remote-a pp-log-dir</name> <value>/data/tmp/logs</value> </property> <property> <descripti On>amount of physical memory, in MB, that can is allocated for containers.</description> <name>yarn. Nodemanager.resource.memory-mb</name> <value>2048</value> </property><property> <name>yarn.scheduler.minimum-allocation-mb</name><value>512</value></property> <propeRty><name>yarn.nodemanager.vmem-pmem-ratio</name><value>1.0</value></property ><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value ></property><!--<property> <description>the class to use as the resource SCHEDULER.&LT;/DESCR Iption> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoo       P.yarn.server.resourcemanager.scheduler.fair.fairscheduler</value> </property> <property> <description>fair-scheduler conf location</description> <name>yarn.scheduler.fair.allocat                Ion.file</name> <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value> </property>--><property><name>yarn.nodemanager.resource.cpu-vcores</name><value >1</value></property> <property> <desCription>the valid service name should only contain a-za-z0-9_ and can don't start with numbers</description> &L T;name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property > <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value&gt ;org.apache.hadoop.mapred.shufflehandler</value> </property></configuration>

2. Java calls Hadoop2.6 and runs the MR Program:

You need to modify the following two places:

1) The configuration of the calling main program needs to be configured:

Configuration conf = new configuration ();        Conf.setboolean ("Mapreduce.app-submission.cross-platform", true);//configuration using cross-platform submit task    Conf.set ("Fs.defaultfs", " hdfs://node101:8020 ");//Specify Namenode      conf.set (" Mapreduce.framework.name "," yarn ");  Specify to use yarn frame    conf.set ("yarn.resourcemanager.address", "node101:8032");//Specify ResourceManager    Conf.set (" Yarn.resourcemanager.scheduler.address "," node101:8030 ");//Specify Resource Allocator
2) Add the following class to Classpath:



==



==


Other places do not have to be modified so that they can run;


3. The Web program calls Hadoop2.6, runs the MR Program;

The program can be downloaded in Java Web program call hadoop2.6;

The Web program invocation part is the same as the above Java, the basic is not modified, the use of the jar package is also placed under the Lib.

Finally, I ran three maps, but three maps were not evenly distributed:


You can see that node103 assigned two map,node101 assigned 11 maps, and once node101 assigned two map,node103 assigned a map, two times node102 were not assigned to the map task, This should be the resource management and assignment of the place is still a bit of a problem.


Share, grow, be happy

Reprint Please specify blog address: http://blog.csdn.net/fansy1990


Java and Web program call hadoop2.6

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.