1. Configure SSH password-free login between cluster machines (1)
ssh-keygen-t dsa-p "-F ~/.SSH/ID_DSAAdd the Id_dsa.pub public key to the authorized key the function of this command is to add the public key to the public key file for authentication, where the Authorized_keys is a public key file for authentication (2)
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys(3) This will be logged into the machine's key into the public key, after the landing of the machine will not need to enter a password, but the cluster is still not password-free login, we also have to put the other machines in the cluster login key fileId_dsa.pub joined Authorized_keys. Our cluster is composed of 3 machines, respectively, Master,slave1,slave2, we execute the above command on 3 hosts, so that each host in the cluster has generated id_dsa.pub files, we will slave1 and Slave2 host Id_ The contents of the Dsa.pub file are added to the master host's Authorized_keys file, and after processing, the master host's Authorized_keys file is like this:Ssh-dss aaaab3nzac1kc3maaacbakpce9woqhmtehklja+6gdseampgnykirgizbqqwhu/dhvnmyaxwgrk42c0sxrtg9q/ zeaamvbtxjmtvij9eimwgh7u0/ijs+pvspgpp1rzoi+5esbwcudrf93yt9/hvm/x9mp+k/betwc7zi1mei+ai/ V6re6ftelws9dkiyhsfaaaafqcoai5gh74xcaux8scxqczk8fohvwaaaiaajmwohenrsanatjffo0fx2dhq8vzqgljzt2xqkqv0vkxqjge8wnv4imiiehdhl0 ksfe6640zi3b2cz3mutqxnok4kxwxi36hhffvlpzcvrme6hvhognzfrbqpmo0clzdk99amf/ tkef2uhrb6pl2qwayzgirzbwm5igq8w47usgaaaiagb3dfhf9gjnrzkiisiesreto1ebjfzk1z7hf3cihwb51i+ gnhvtlzuuljelis8ottku0izci3zvcwwgi+anahak+9n/vwppzc75q7tp+xpw0oawhec7ojhnj4oiuynv8+ qqdgk51njl8pwqncw5ytar1gxmxfpnq1do29jw5fdq==[email protected]Ssh-dss aaaab3nzac1kc3maaacbajn2nyzap/vxlecmgcfxwyvz2uy9cilwhohtqnlex5gijuwfevvlzpuxzhrmmjdo40rn6h/ ggf2qgrcdo0nm7aaoo3ng2cw3e1mrpkdgpi+qyrnuwtdz6a2jws//gourba359v/ 8nqgkdpzxw1jcne3qzlxjq2yftplfmmv7yv01aaaafqdoibklehjrtghucct6chbmv69jjwaaaieagj9pifkkudavep60yqy3+ ci2rsau1jbopxouzljcyzcszm+z1+b4hkgf23msk0nepl0ugnlicgk6ggiulbhtamoq/go6hn5i1tetxjdklwg1pagoh8wua6glziyxrz/ 0okjtdjaoirctvfnd/yyoo3xe8jpgzjwqwuscw44w3zqaaacadgfdyzg34jr3m+bukb11vgcv6nkeyu/cp/ osx5lgjqwwwd2f0udsyeauqvvkccnb9mb10h0ojcsfngtbula8kpdxm03q2vkjcjxqcrx+ c9qohctf1eam7gfmsuaeegzvv2ur122qxsxsxziijxhkzkzbzntioipm0keaqp0cz48=[email protected]
Ssh-dss aaaab3nzac1kc3maaacbaolxtxe3hlhc01szjfxktbjufjnqwan/evxcalvhv/dx9jsp5oroeclne9nlzel+ nu9ax0jh7zybyvq2xk/lw9syfkjwntdwxcpetbrrh1nx+dv1lenthyvgaj411lhzlfnkyaztxpwb/ ux8jk9f6gb16uvwtg1kjcqwo44q5mtfaaaafqdw/590knub5mxnqcmbe4ggfk8dmqaaaiag2gehepak+etd9uekwl/k5168ng9sma7swvabs/ dvepfdpp2wy+wnoomyryvvtpsbfeyam/ncatsrmwcgorodaj4ikymdl3qltolelnjbac8pchez1igkr2jpgdiqsslbkvb/q8+ qvmwylhiqneoyggooeokdtmhvmwor053/haaaaib/kgh9fn4ie+5zrmqliytdes3ztm/ ik3uu0foonwkdetvaxvp1xxotkqikeh3bgfhwgfdujnttlrs+qqvaqqcpcj8lr8+ pqh0ubxt2rz1asgviuvok8mbosj3eujcigccbf3schy8tyiu7fsaynavqfubsbmv/6hpbhjnyc1+maa==[email protected]Then the master host processing after the Authorized_keys file overwrite the slave1 and slave2 host ~/.ssh/directory Authorized_keys file, so that the cluster internal hosts have implemented a password-free login. Restart the computer, we arbitrarily choose a host, respectively SSH two other hosts, if can not enter the password can be directly landed, then the configuration is successful. 2. Configure some of the configuration files in Hadoop to extract the Hadoop installation files to the/cloud directory, as follows: (1) Edit the configuration file hadoop-env.sh specify Java_home directory First look at Java_home address: Echo $JAVA _ Home can know the address of Java_home as follows:/usr/lib/jvm/java-1.7.0-openjdk.x86_64vi/cloud/hadoop-2.2/etc/hadoop/ Hadoop-env.sh (2) configuration file Core-site.xml, add the following: Vi/cloud/hadoop-2.2/etc/hadoop/core-site.xml <configuration > <property> <name>fs.default.name</name> & nbsp <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> < property> <name>hadoop.tmp.dir</name> <value>/cloud/hadoopData</value> </property></configuration> ①Setting the access address for HDFs ishdfs://110.64.76.130:9000,② temporary file storage address is/cloud/hadoopdata,
be careful to create this directory (3) configuration file Hdfs-site.xmlvi/cloud/hadoop-2.2/etc/hadoop/hdfs-site.xml Add the following:<configuration><property> <name>dfs.replication</name> <value>2</value> </property> <property> < Name>dfs.namenode.name.dir</name> <value>/cloud/hadoopData/name</value> </property> & Lt;property> <name>dfs.datanode.data.dir</name> <value>/cloud/hadoopdata /data</value> </property> </configuration> (4) configuration file Yarn-site.xmlvi /cloud/hadoop-2.2/etc/hadoop/yarn-site.xml Add the following: <?xml version= "1.0"?> <configuration>< property> <name>yarn.resourcemanager.resource-tracker.address</name> < value>master:8031</value> <description>host is the hostname of the resource manager and Port is the port on which, the Nodemanagers contact the Resource manager. </DESCRIPTION>&N Bsp </property> <property> <name>yarn.resourcemanager.scheduler.address </name> <value>master:8030</value> <description>host is the Hostname of the ResourceManager and Port are the port on which the applications Ource manager. </description> </property> &nbsP <property> <name>yarn.resourcemanager.scheduler.class</name> <value >org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> <description>in case you don't want to use the default scheduler</description> </property> <property> <name>yarn.resourcemanager.address</name> < value>master:8032</value> <description>the Host is the hostname of the ResourceManager and th E Port is the port on which the clients can talk to the Resource Manager. </description> </property> <property> <name> yarn.nodemanager.address</name> <value>0.0.0.0:8034</value> < Description>the nodemanagers bind to this port</description> </property> < Property>&nbSp <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> <description>the amount of memory on the NodeManager in gb</description> </property> ; <property> <name>yarn.nodemanager.aux-services</name> <value >mapreduce_shuffle</value> <description>shuffle service that needs to being set for Map Reduce to Run </description> </property></configuration> (5) configuration file slaves Modify to the following: Slave1slave2 3. Add Hadoop to the environment variable add the following in the/etc/profile file and update the system configuration. Export Hadoop_home=/cloud/hadoop-2.2expoer path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin Execute the following command, Make environment variable settings effective
Source/etc/profile
4. Copy and distribute the Hadoop installation profile to other hosts on the cluster cd/cloudscp-r hadoop-2.2 [email protected]:/cloudscp-r hadoopdata [email protected]:/ Cloudscp-r hadoop-2.2 [email protected]:/cloudScp-r hadoopdata [email protected]:/cloud5. Format HDFs File system The following operations are performed on the master host Cd/cloud/binhdfs Namenode-format (just run once) 6. Start the Hadoop service on each Hadoop node cd/cloud/hadoop-2.2/sbinmaster:./start-dfs.sh./start-yarn.shslave1 and slave2: in Hadoop 2.x, The MapReduce job does not require an additional daemon process, and at the start of the job, NodeManager will start a mapreduce application Master (rather with a streamlined jobtracker), The job ends automatically when it is closed.
so there is no need to execute commands in Slave1 and slave2 to start the node. 7. Testing Hadoop clusters
Namenode, ResourceManager and various NodeManager web interfaces can be opened in a browser,
-NameNode Web UI, Http://master:50070/-ResourceManager Web UI, Http://master:8088/-NodeManager Web UI, http://s lave01:8042
You can also start jobhistory Serverto view the history Jobof a cluster through a Web page, and execute the following command:
mr-jobhistory-daemon.sh Start Historyserver
By default, the 19888 port is used to view historical information by accessing http://master:19888/ .
To terminate jobhistory Server, execute the following command:
mr-jobhistory-daemon.sh Stop Historyserver
9. Run the WordCount sample program
HDFs Dfs-mkdir/user
HDFs Dfs-mkdir/user/root is used to create a user folder, which is stored in the user directory by default if the path is not specified later
HDFs dfs-put./test.txt input copies test.txt files from the local directory to user path strength as input file
Hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount Input Output
HDFs Dfs-cat output/*
stop running a hadoop cluster
Execute on master :
Cd/cloud/hadoop-2.2/sbin
./stop-yarn.sh
./stop-dfs.sh
Installing hadoop2.2 in CentOS 6.5