Installing hadoop2.2 in CentOS 6.5

Last Update:2016-03-02 Source: Internet

Author: User

Tags temporary file storage hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Configure SSH password-free login between cluster machines (1) ssh-keygen-t dsa-p "-F ~/.SSH/ID_DSAAdd the Id_dsa.pub public key to the authorized key the function of this command is to add the public key to the public key file for authentication, where the Authorized_keys is a public key file for authentication (2) cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys(3) This will be logged into the machine's key into the public key, after the landing of the machine will not need to enter a password, but the cluster is still not password-free login, we also have to put the other machines in the cluster login key fileId_dsa.pub joined Authorized_keys. Our cluster is composed of 3 machines, respectively, Master,slave1,slave2, we execute the above command on 3 hosts, so that each host in the cluster has generated id_dsa.pub files, we will slave1 and Slave2 host Id_ The contents of the Dsa.pub file are added to the master host's Authorized_keys file, and after processing, the master host's Authorized_keys file is like this:Ssh-dss aaaab3nzac1kc3maaacbakpce9woqhmtehklja+6gdseampgnykirgizbqqwhu/dhvnmyaxwgrk42c0sxrtg9q/ zeaamvbtxjmtvij9eimwgh7u0/ijs+pvspgpp1rzoi+5esbwcudrf93yt9/hvm/x9mp+k/betwc7zi1mei+ai/ V6re6ftelws9dkiyhsfaaaafqcoai5gh74xcaux8scxqczk8fohvwaaaiaajmwohenrsanatjffo0fx2dhq8vzqgljzt2xqkqv0vkxqjge8wnv4imiiehdhl0 ksfe6640zi3b2cz3mutqxnok4kxwxi36hhffvlpzcvrme6hvhognzfrbqpmo0clzdk99amf/ tkef2uhrb6pl2qwayzgirzbwm5igq8w47usgaaaiagb3dfhf9gjnrzkiisiesreto1ebjfzk1z7hf3cihwb51i+ gnhvtlzuuljelis8ottku0izci3zvcwwgi+anahak+9n/vwppzc75q7tp+xpw0oawhec7ojhnj4oiuynv8+ qqdgk51njl8pwqncw5ytar1gxmxfpnq1do29jw5fdq==[email protected]Ssh-dss aaaab3nzac1kc3maaacbajn2nyzap/vxlecmgcfxwyvz2uy9cilwhohtqnlex5gijuwfevvlzpuxzhrmmjdo40rn6h/ ggf2qgrcdo0nm7aaoo3ng2cw3e1mrpkdgpi+qyrnuwtdz6a2jws//gourba359v/ 8nqgkdpzxw1jcne3qzlxjq2yftplfmmv7yv01aaaafqdoibklehjrtghucct6chbmv69jjwaaaieagj9pifkkudavep60yqy3+ ci2rsau1jbopxouzljcyzcszm+z1+b4hkgf23msk0nepl0ugnlicgk6ggiulbhtamoq/go6hn5i1tetxjdklwg1pagoh8wua6glziyxrz/ 0okjtdjaoirctvfnd/yyoo3xe8jpgzjwqwuscw44w3zqaaacadgfdyzg34jr3m+bukb11vgcv6nkeyu/cp/ osx5lgjqwwwd2f0udsyeauqvvkccnb9mb10h0ojcsfngtbula8kpdxm03q2vkjcjxqcrx+ c9qohctf1eam7gfmsuaeegzvv2ur122qxsxsxziijxhkzkzbzntioipm0keaqp0cz48=[email protected]
Ssh-dss aaaab3nzac1kc3maaacbaolxtxe3hlhc01szjfxktbjufjnqwan/evxcalvhv/dx9jsp5oroeclne9nlzel+ nu9ax0jh7zybyvq2xk/lw9syfkjwntdwxcpetbrrh1nx+dv1lenthyvgaj411lhzlfnkyaztxpwb/ ux8jk9f6gb16uvwtg1kjcqwo44q5mtfaaaafqdw/590knub5mxnqcmbe4ggfk8dmqaaaiag2gehepak+etd9uekwl/k5168ng9sma7swvabs/ dvepfdpp2wy+wnoomyryvvtpsbfeyam/ncatsrmwcgorodaj4ikymdl3qltolelnjbac8pchez1igkr2jpgdiqsslbkvb/q8+ qvmwylhiqneoyggooeokdtmhvmwor053/haaaaib/kgh9fn4ie+5zrmqliytdes3ztm/ ik3uu0foonwkdetvaxvp1xxotkqikeh3bgfhwgfdujnttlrs+qqvaqqcpcj8lr8+ pqh0ubxt2rz1asgviuvok8mbosj3eujcigccbf3schy8tyiu7fsaynavqfubsbmv/6hpbhjnyc1+maa==[email protected]Then the master host processing after the Authorized_keys file overwrite the slave1 and slave2 host ~/.ssh/directory Authorized_keys file, so that the cluster internal hosts have implemented a password-free login. Restart the computer, we arbitrarily choose a host, respectively SSH two other hosts, if can not enter the password can be directly landed, then the configuration is successful. 2. Configure some of the configuration files in Hadoop to extract the Hadoop installation files to the/cloud directory, as follows: (1) Edit the configuration file hadoop-env.sh specify Java_home directory First look at Java_home address: Echo $JAVA _ Home can know the address of Java_home as follows:/usr/lib/jvm/java-1.7.0-openjdk.x86_64vi/cloud/hadoop-2.2/etc/hadoop/ Hadoop-env.sh (2) configuration file Core-site.xml, add the following: Vi/cloud/hadoop-2.2/etc/hadoop/core-site.xml <configuration > <property> <name>fs.default.name</name> & nbsp <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> < property> <name>hadoop.tmp.dir</name> <value>/cloud/hadoopData</value> </property></configuration> ①Setting the access address for HDFs ishdfs://110.64.76.130:9000,② temporary file storage address is/cloud/hadoopdata, be careful to create this directory (3) configuration file Hdfs-site.xmlvi/cloud/hadoop-2.2/etc/hadoop/hdfs-site.xml Add the following:<configuration><property> <name>dfs.replication</name> <value>2</value> </property> <property> < Name>dfs.namenode.name.dir</name> <value>/cloud/hadoopData/name</value> </property> & Lt;property> <name>dfs.datanode.data.dir</name> <value>/cloud/hadoopdata /data</value> </property> </configuration> (4) configuration file Yarn-site.xmlvi /cloud/hadoop-2.2/etc/hadoop/yarn-site.xml Add the following: <?xml version= "1.0"?> <configuration>< property> <name>yarn.resourcemanager.resource-tracker.address</name> < value>master:8031</value> <description>host is the hostname of the resource manager and Port is the port on which, the Nodemanagers contact the Resource manager. &LT;/DESCRIPTION&GT;&N Bsp </property> <property> <name>yarn.resourcemanager.scheduler.address </name> <value>master:8030</value> <description>host is the Hostname of the ResourceManager and Port are the port on which the applications Ource manager. </description> </property> &nbsP <property> <name>yarn.resourcemanager.scheduler.class</name> <value >org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> <description>in case you don't want to use the default scheduler</description> </property> <property> <name>yarn.resourcemanager.address</name> < value>master:8032</value> <description>the Host is the hostname of the ResourceManager and th E Port is the port on which the clients can talk to the Resource Manager. </description> </property> <property> <name> yarn.nodemanager.address</name> <value>0.0.0.0:8034</value> < Description>the nodemanagers bind to this port</description> </property> < Property>&nbSp <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> <description>the amount of memory on the NodeManager in gb</description> </property&gt ; <property> <name>yarn.nodemanager.aux-services</name> <value >mapreduce_shuffle</value> <description>shuffle service that needs to being set for Map Reduce to Run </description> </property></configuration> (5) configuration file slaves Modify to the following: Slave1slave2 3. Add Hadoop to the environment variable add the following in the/etc/profile file and update the system configuration. Export Hadoop_home=/cloud/hadoop-2.2expoer path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin Execute the following command, Make environment variable settings effective Source/etc/profile 4. Copy and distribute the Hadoop installation profile to other hosts on the cluster cd/cloudscp-r hadoop-2.2 [email protected]:/cloudscp-r hadoopdata [email protected]:/ Cloudscp-r hadoop-2.2 [email protected]:/cloudScp-r hadoopdata [email protected]:/cloud5. Format HDFs File system The following operations are performed on the master host Cd/cloud/binhdfs Namenode-format (just run once) 6. Start the Hadoop service on each Hadoop node cd/cloud/hadoop-2.2/sbinmaster:./start-dfs.sh./start-yarn.shslave1 and slave2: in Hadoop 2.x, The MapReduce job does not require an additional daemon process, and at the start of the job, NodeManager will start a mapreduce application Master (rather with a streamlined jobtracker), The job ends automatically when it is closed. so there is no need to execute commands in Slave1 and slave2 to start the node. 7. Testing Hadoop clusters

Namenode, ResourceManager and various NodeManager web interfaces can be opened in a browser,

-NameNode Web UI, Http://master:50070/-ResourceManager Web UI, Http://master:8088/-NodeManager Web UI, http://s lave01:8042

You can also start jobhistory Serverto view the history Jobof a cluster through a Web page, and execute the following command:

mr-jobhistory-daemon.sh Start Historyserver

By default, the 19888 port is used to view historical information by accessing http://master:19888/ .

To terminate jobhistory Server, execute the following command:

mr-jobhistory-daemon.sh Stop Historyserver

9. Run the WordCount sample program

HDFs Dfs-mkdir/user

HDFs Dfs-mkdir/user/root is used to create a user folder, which is stored in the user directory by default if the path is not specified later

HDFs dfs-put./test.txt input copies test.txt files from the local directory to user path strength as input file

Hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount Input Output

HDFs Dfs-cat output/*

stop running a hadoop cluster

Execute on master :

Cd/cloud/hadoop-2.2/sbin

./stop-yarn.sh

./stop-dfs.sh

Installing hadoop2.2 in CentOS 6.5

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More