Installing a single-node pseudo-distributed CDH Hadoop cluster

Source: Internet
Author: User
Tags auth log log mkdir require hadoop fs

The original installation is three nodes, today to install a single node, after the completion of MapReduce is always unable to submit to YARN, tossing an afternoon did not fix

MR1 Job submitted to Jobtracker, in YARN should be submitted to ResourceManager, but found a localjob, found to do the following configuration does not take effect

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
In fact, in YARN does not have the following configuration, but after checking the code jobclient code or do the following configuration, the value of ResourceManager address,

   <property>
        <name>mapred.job.tracker</name>
        <value>com3:8031</value>
   </property>

Although this time found ResourceManager, but always encountered Unknown RPC kind rpc_writable problem

After checking, the server, that is, the ResourceManager, each time an RPC type is used, it is registered to a MAP variable and can only handle RPC types that have already been registered.

There are only two types here: Google's protobuf and Hadoop's writable

public class Protobufrpcengine implements Rpcengine {public
  static final log log = Logfactory.getlog (protobufrpcengi Ne.class);
  
  static {//Register the rpcrequest deserializer for Writablerpcengine 
    Org.apache.hadoop.ipc.Server.registerProtocolEngine (
        RPC. Rpckind.rpc_protocol_buffer, Rpcrequestwritable.class,
        new Server.protobufrpcinvoker ());
  }

However, the server only registers the PROTOBUF and therefore cannot accept the writable type of message that is used when the client submits the Job, causing the above error

After checking the client code, the protocol that is used to submit the job to the client jobsubmissionprotocol specifies that the RPC type used is protobuf:

  <property>
   	<name>rpc.engine.org.apache.hadoop.mapred.JobSubmissionProtocol</name>
   	<value>org.apache.hadoop.ipc.ProtobufRpcEngine</value>
   </property>

The result is the following error:

Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.ipc.protobufrpcengine$ Invoker.constructrpcrequest (protobufrpcengine.java:138) at org.apache.hadoop.ipc.protobufrpcengine$
	Invoker.invoke (protobufrpcengine.java:193) at org.apache.hadoop.mapred. $Proxy 10.getStagingAreaDir (Unknown Source) At Org.apache.hadoop.mapred.JobClient.getStagingAreaDir (jobclient.java:1340) at Org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir (jobsubmissionfiles.java:102) at Org.apache.hadoop.mapred.jobclient$2.run (jobclient.java:954) at Org.apache.hadoop.mapred.jobclient$2.run ( jobclient.java:948) at java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (subject.java:396) at Org.apache.hadoop.security.UserGroupInformation.doAs ( usergroupinformation.java:1408) at Org.apache.hadoop.mapred.JobClient.submitJobInternal (jobclient.java:948) at Org.apache.hadoop.mapreduce.Job.submit (job.java:566) at org.apache.hadoop.mapreduce.Job.waitForcompletion (job.java:596) at Mr.ref.WordCount.main (wordcount.java:90) at SUN.REFLECT.NATIVEMETHODACCESSORIMPL.INVOKE0 (Native Method) at Sun.reflect.NativeMethodAccessorImpl.invoke ( nativemethodaccessorimpl.java:39) at Sun.reflect.DelegatingMethodAccessorImpl.invoke ( DELEGATINGMETHODACCESSORIMPL.JAVA:25) at Java.lang.reflect.Method.invoke (method.java:597) at Org.apache.hadoop.util.RunJar.main (runjar.java:208)

The reason is that ResourceManager accepted the message, but in the process of the message content made the wrong assumptions, the result is still unable to process

If you do not specify a client protocol type, this is the error:

14/03/31 11:16:52 ERROR Security. Usergroupinformation:priviledgedactionexception as:root (auth:simple) cause:org.apache.hadoop.ipc.RemoteException (java.io.IOException): Unknown RPC Kind rpc_writable
Exception in thread "main" Org.apache.hadoop.ipc.RemoteException (java.io.IOException): Unknown RPC Kind rpc_writable at
	Org.apache.hadoop.ipc.Client.call (client.java:1238) at
	org.apache.hadoop.ipc.writablerpcengine$ Invoker.invoke (writablerpcengine.java:225)


Modifying the client RPC type is useless, so modify the service-side, the default value is

<property>
	<name>yarn.ipc.rpc.class</name>
	<value> Org.apache.hadoop.yarn.ipc.hadoopyarnprotorpc</value>
</property>
I have tried to change to org.apache.hadoop.hbase.ipc.WritableRpcEngine and other values, which can cause a variety of problems, and there are several configurations related to the service-side protocol, as well as other ways

To check why the Mapreduce.framework.name did not take effect, according to this value found a jobclient, its Init method is different from the jobclient of Eclipse dependency, this note

  /**
   * Connect to the default cluster
   * @param conf the job configuration.
   * @throws IOException
   *
  /public void init (jobconf conf) throws IOException {
    setconf (conf);
    cluster = new cluster (conf);
    Clientugi = Usergroupinformation.getcurrentuser ();
  }

This is still the jobclient of the MR1 era, in/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.5.0.jar

And/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.5.0.jar have a jobclient, the former is the YARN age

After checking the CLASSPATH to run the Job, fix the CLASSPATH, modify the file /usr/lib/hadoop/libexec/hadoop-layout.sh

Will hadoop_mapred_home=${hadoop_mapred_home:-"/usr/lib/hadoop-0.20-mapreduce"}

Change to hadoop_mapred_home=${hadoop_mapred_home:-"/usr/lib/hadoop-mapreduce"}

Can be, but at this time to submit Job or error, very difficult to find, the YARN log to DEBUG, the client, the background is still no error, this problem back again


The above is actually a question about the variable settings, and later installed the pseudo-distribution cluster was discovered, CDH documentation has been explained

If you want to submit the Job to YARN, you need to set:

Export Hadoop_mapred_home=/usr/lib/hadoop-mapreduce
If you want to submit the Job to Jobtracker, the following settings are required, which is also the default value

Export Hadoop_mapred_home=/usr/lib/hadoop-0.20-mapreduce

Because after installing a single point cluster today, the JOB cannot be submitted to YARN, and the problem is difficult to find, there is no ERROR in the DEBUG log, only one WARN:

After the Job is submitted, the client console stops outputting, at this time the ResourceManager log, note WARN and FAIL

13714 2014-03-31 19:50:50,870 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:application_ 1396266549856_0001 S Tate change from ACCEPTED to FAILED 13715 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.e Vent.      Asyncdispatcher:dispatching the event Org.apache.hadoop.yarn.ser Ver.resourcemanager.scheduler.event.AppRemovedSchedulerEvent.EventType:APP_REMOVED 13716 2014-03-31 19:50:50,870 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:Dispatching the event Org.apache.hadoop.yarn.ser Ver.resourceman Ager.rmnode.RMNodeCleanAppEvent.EventType:CLEANUP_APP 13717 2014-03-31 19:50:50,870 DEBUG Org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:Processing com2:55147 of type Cleanup_app 13718 201 4-03-31 19:50:50,871 DEBUG Org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Post-assigncontain ers 13719 2014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.c Ommon.fica.FiCaSchedulerapp:showrequ ests:application=application_1396266549856_0001 headroom=memory:6144 currentConsumption=0 13720 2 014-03-31 19:50:50,871 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp : showrequ ests:application=application_1396266549856_0001 request={priority:0, capability:memory:2048} 13721 201 4-03-31 19:50:50,871 DEBUG Org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:Node after Allocat ion com2:55147 resource = memory:8192 13722 2014-03-31 19:50:50,872 INFO Org.apache.hadoop.yarn.server.resou  Rcemanager.scheduler.AppSchedulingInfo:Application Applicat ion_1396266549856_0001 requests cleared 13723 2014-03-31 19:50:50,872 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:Dispatching the event Org.apache.hadoop.yarn.ser ve r.resourcemanager.rmappmanagerevent.eventtype:app_completed 13724 2014-03-31 19:50:50,872 DEBUG Org.apache.hadoop.yarn.server.resourcemanager.RMAppManager:RMAppMAnager processing event for application_1396266549856_0001 of type app_completed *********************************** * 13725 2014-03-31 19:50:50,872 WARN org.apache.hadoop.yarn.server.resourcemanager.rmauditlogger:user=root OPERATION       =applicatio n finished-failed target=rmappmanager result=failure Description=app Failed with state:failed PERMISSIONS=APPL ication application_1396266549856_0001 failed 1 times due to AM Container for appattempt_13962665 49856_0001_000001 exited wi th exitcode:1 due to:13726. Failing this attempt.        Failing the application. appid=application_1396266549856_0001 13727 2014-03-31 19:50:50,876 INFO Org.apache.hadoop.yarn.server.resourcemanager.rmappmanager$applicationsummary:appid=applicatio n_1396266549856_ 0001,name=word count,user=root,queue=default,state=failed,trackingurl=com2:8088/proxy/application_1396266549 856 _0001/,appmasterhost=n/a,starttime=1396266647295,finishtime=1396266650870 **************13728 2014-03-31 19:50:51,519 DEBUG org.apache.hadoop.ipc.Server:got #55 

NodeManager logs at the same time:

2014-03-31 19:50:50,528 WARN org.apache.hadoop.yarn.server.nodemanager.nmauditlogger:user=root OPERATION=Container    finished-failed Target=containerimpl result=failure description=container Failed with State:exited_with_failure appid=application_1396266549856_0001 containerid=container_1396266549856_0001_01_000001 2014-03-31 19:50:50,280 INFO Org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:launchContainer: [Bash,/opt/data/hadoop/ Hadoop-yarn/nm-local-dir/usercache/root/appcache/application_1396266549856_0001/container_1396266549856_0001_ 01_000001/DEFAULT_CONTAINER_EXECUTOR.SH] 2014-03-31 19:50:50,493 WARN Org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:Exit code from Task is:1 2014-03-31 19:50:50,493 INFO org.apache.hadoop.yarn.server.nodemanager.containerexecutor:2014-03-31 19:50:50,494 DEBUG Org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:Processing Container_ 1396266549856_0001_01_000001 of type Update_diagnostics_msg 2014-03-31 19:50:50,494 DEBUG Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:Container Container_ 1396266549856_0001_01_000001 completed with exit code 1 2014-03-31 19:50:50,495 WARN Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:Container exited with a Non-zero exit code 1 2014-03-31 19:50:50,495 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:Dispatching the event or G.apache.hadoop.yarn.server.nodemanager.containermanager.container.containerexitevent.eventtype:container_ Exited_with_failure 2014-03-31 19:50:50,495 DEBUG Org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:Processing Container_ 1396266549856_0001_01_000001 of type container_exited_with_failure 2014-03-31 19:50:50,496 INFO Org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:Container Container_ 1396266549856_0001_01_000001 transitioned from RUNNING to Exited_with_faIlure 2014-03-31 19:50:50,496 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:Dispatching the event Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:CLEANUP_ CONTAINER 2014-03-31 19:50:50,496 INFO Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:Cleaning up Container container_1396266549856_0001_01_000001

This is not a waste of time, because only to find that CDH has provided a ready-made configuration hadoop-conf-pseudo.x86_64, direct installation is OK, briefly describe the installation steps:

Prepare CDH Warehouse Configuration

[cloudera-cdh4.2.1]
Name=cloudera ' s distribution for Hadoop, Version 4.2.1
baseurl=http://archive-primary.cloudera.com/cdh4/redhat/ 6/X86_64/CDH/4.2.1/
Gpgkey =  Http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
Gpgcheck = 1
The physical machine is Fedora and needs to be installed:

Yum-y Install Createrepo yum-utils

Copy Warehouse to local, create local warehouse
Mkdir-p/var/www/cloudera-cdh4/cdh4/4.2.1/rpms
reposync-p/var/www/cloudera-cdh4/cdh4/4.2.1/rpms--repoid= cloudera-cdh4.2.1
Createrepo-o/var/www/cloudera-cdh4/cdh4/4.2.1/var/www/cloudera-cdh4/cdh4/4.2.1/rpms

Configure the warehouse as a WEB access directory for virtual machine installation, my version of Apache version 2.4.7,rhel6.x may be lower, the permissions configuration is slightly different

In the CentOS 6.4 2.2.15, you need to remove Require all granted.

# cat/etc/httpd/conf.d/cloudera.conf
 
namevirtualhost 192.168.3.1:80
<virtualhost 192.168.3.1:80>
    Documentroot/var/www/cloudera-cdh4
    ServerName 192.168.3.1
    <directory/>
    Options all
    allowoverride all
    Require all granted
    </Directory>
</VirtualHost>
There is also a package installed in the host that installs Hadoop, which is available in the IOS file

Yum-y Install NC
Start httpd, configure Yum in the virtual machine

# Cat/etc/yum.repos.d/cloudera-cdh4.2.1.repo 
[cloudera-cdh4.2.1]
name=cdh4.2.1
baseurl=http:// 192.168.3.1/CDH4/4.2.1/
gpgcheck = 0
enable=1
Now you can install Hadoop

Yum-y Install hadoop.x86_64 hadoop-hdfs-namenode.x86_64 hadoop-hdfs-datanode.x86_64 
yum-y Install hadoop-client.x86_64 hadoop-mapreduce.x86_64 hadoop-conf-pseudo.x86_64
yum-y Install Hadoop-yarn-resourcemanager.x86_64 hadoop-yarn-nodemanager.x86_64
Ready-made Conf.pseudo, very intimate
Alternatives--install/etc/hadoop/conf hadoop-conf/etc/hadoop/conf.pseudo/30
Alternatives--set hadoop-conf/ etc/hadoop/conf.pseudo/
Format, start HDFS
Sudo-u HDFs HDFs namenode-format
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-hdfs-datanode Start
Create a working directory

Sudo-u HDFs Hadoop fs-mkdir-p/tmp/hadoop-yarn/staging/history/done_intermediate
sudo-u hdfs Hadoop fs-chown-r m Apred:mapred/tmp/hadoop-yarn/staging
sudo-u hdfs hadoop fs-chmod-r 1777/tmp
sudo-u hdfs Hadoop fs-mkdir-p /var/log/hadoop-yarn
sudo-u HDFs Hadoop fs-chown Yarn:mapred/var/log/hadoop-yarn
Start YARN

/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-nodemanager start
Create a user Directory

Sudo-u HDFs Hadoop fs-mkdir/user/hdfs
sudo-u hdfs hadoop fs-chown hdfs/user/hdfs
sudo-u hdfs Hadoop FS-MKD  Ir/user/root
sudo-u hdfs hadoop fs-chown root/user/root
sudo-u hdfs hadoop fs-mkdir/user/mapred
sudo-u HDFs Hadoop fs-chown mapred/user/mapred
sudo-u hdfs hadoop fs-mkdir/user/yarn
sudo-u hdfs Hadoop fs-chow N Yarn/user/yarn

Installation Complete

Using a test user below, the test submission Job, like the document, is called Joe

[Root@com2 mr]# useradd Joe [Root@com2 mr]# passwd Joe [Root@com2 mr]# su Joe [joe@com2 mr]$ Export hadoop_mapred_home=/ usr/lib/hadoop-mapreduce [joe@com2 mr]# sudo-u hdfs hadoop fs-mkdir/user/joe [joe@com2 mr]# sudo-u HDFs Hadoop fs-cho WN Joe/user/joe [joe@com2 mr]$ hadoop fs-mkdir input [joe@com2 mr]$ hadoop fs-put/etc/hadoop/conf/*.xml input [joe@c Om2 mr]$ Hadoop fs-ls input Found 4 items-rw-r--r--1 joe SuperGroup 1461 2014-03-31 21:35 Input/core-site.xml- rw-r--r--1 Joe SuperGroup 1854 2014-03-31 21:35 input/hdfs-site.xml-rw-r--r--1 Joe SuperGroup 1325 2014 -03-31 21:35 input/mapred-site.xml-rw-r--r--1 joe SuperGroup 2262 2014-03-31 21:35 Input/yarn-site.xml Run mapred UCE, viewing results [joe@com2 mr]$ hadoop jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ' dfs[a-z .] + ' [joe@com2 mr]$ hadoop fs-ls output23 Found 2 items-rw-r--r--1 joe SuperGroup 0 2014-03-31 21:37 output23/ _success-rw-r--r--1Joe SuperGroup 2014-03-31 21:37 output23/part-r-00000 [joe@com2 mr]$ [joe@com2 mr]$ Hadoop fs-cat output23/par t-r-00000 | Head 1 dfs.safemode.min.datanodes 1 dfs.safemode.extension 1 dfs.replication 1 dfs.namenode.name.dir 1 dfs.namen
 Ode.checkpoint.dir 1 Dfs.datanode.data.dir



Reference:

Http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_3.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.