1. Installation
Here we assume that the three machine names we run the Hadoop cluster are fanbinx1,fanbinx2,fanbinx3. Where fanbinx1 as the master node, fanbinx2 and fanbinx3 as slave nodes.
In addition to our HADOOP 2.5.1 installation package installed into the/opt/hadoop directory of each machine, in order to illustrate the convenience we use here $hadoop_home to replace the/opt/hadoop directory
and create the following three directories under this directory
Mkdir-p $HADOOP _home/dfs/name
mkdir-p $HADOOP _home/dfs/data
mkdir-p $HADOOP _home/temp
2. Configure
Here are a few of the following configuration files and script files to change Hadoop
$HADOOP _home/etc/hadoop/hadoop-env.sh
$HADOOP _home/etc/hadoop/yarn-env.sh
$HADOOP _home/etc/hadoop/ Core-site.xml
$HADOOP _home/etc/hadoop/hdfs-site.xml
$HADOOP _home/etc/hadoop/mapred-site.xml
$ Hadoop_home/etc/hadoop/yarn-site.xml
$HADOOP _home/etc/hadoop/slaves
2.1 $HADOOP _home/etc/hadoop/hadoop-env.sh Specify Java_home environment variables
Export JAVA_HOME=/OPT/JDK7
2.2 $HADOOP _home/etc/hadoop/yarn-env.sh Specify JAVA_HOME environment variables
Export JAVA_HOME=/OPT/JDK7
2.3 $HADOOP _home/etc/hadoop/core-site.xml
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value> hdfs://fanbinx1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir </name>
<value>/opt/hadoop/temp</value>
</property>
</configuration >
2.4 $HADOOP _home/etc/hadoop/hdfs-site.xml
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value> 3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/dfs/name</value>
</property>
<property>
<name >dfs.datanode.data.dir</name>
<value>/opt/hadoop/dfs/data</value>
</property >
</configuration>
2.5 $HADOOP _home/etc/hadoop/mapred-site.xml
<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
< configuration>
<property>
<name>mapreduce.framework.name</name>
<value> yarn</value>
</property>
</configuration>
2.6 $HADOOP _home/etc/hadoop/yarn-site.xml
<?xml version= "1.0"?> <configuration> <property> <NAME>YARN.NODEMANAGER.AUX-SERVICES&L t;/name> <value>mapreduce_shuffle</value> </property> <property> <na Me>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value> Org.apache.hadoop.mapred.shufflehandler</value> </property> <property> <name>yarn. resourcemanager.address</name> <value>fanbinx1:8032</value> </property> <proper Ty> <name>yarn.resourcemanager.scheduler.address</name> <value>fanbinx1:8030</valu e> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</nam e> <value>fanbinx1:8031</value> </property> <property> <name>yarn
.resourcemanager.admin.address</name><value>fanbinx1:8033</value> </property> <property> <name>yarn.resourcemanag er.webapp.address</name> <value>fanbinx1:8088</value> </property> </CONFIGURATION&G T
2.7 $HADOOP _home/etc/hadoop/slaves This folder is used to define slave nodes
Fanbinx2
Fanbinx3
2.8 Finally, the configuration files need to be replicated to the other two slave nodes.
3. Set up Linux can ssh user can be password-free login
$ ssh-keygen-t dsa-p ' F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
4. Start Hadoop cluster
4.1 First Format Namenode
$ Bin/hdfs Namenode-format
4.2 Start HDFs
Run the following command on the master machine
$ sbin/start-dfs.sh
Run on Master Machine Ps-ef | grep Hadoop "can view two Hadoop processes to Namenode and Secondarynamenode
Running on the slave machine "Ps-ef | grep Hadoop "can be viewed to datanode a Hadoop process
4.3 Start yarn
Run the following command on the master machine
$ sbin/start-yarn.sh
Run on Master Machine Ps-ef | grep Hadoop "can view three Hadoop processes to Namenode,secondarynamenode and ResourceManager
Running on the slave machine "Ps-ef | grep Hadoop "can view two Hadoop processes to Datanode and NodeManager
4.4 Verification
After you start HDFs and yarn, you can view the status through the two URLs of the project
View hdfs:http://fanbin1:50070/
View rm:http://fanbin1:8088/cluster/
You can also use the following command line to view the cluster status
$ Bin/hdfs Dfsadmin-report
4.5 This can also make him "sbin/start-all.sh" and "sbin/stop-all.sh" instead of starting/stopping HDFs and yarn two services.
5. Run the sample program
Submit Job First
$ Bin/hdfs dfs-mkdir/user
$ bin/hdfs dfs-mkdir/user/<username>
$ bin/hdfs dfs-put etc/hadoop input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input Output ' dfs[a-z.] +'
View Results
$ bin/hdfs dfs-get output output
$ cat output/*