The distributed model of Hadoop in actual combat

Source: Internet
Author: User
Tags mkdir xsl hdfs dfs

1. Installation
Here we assume that the three machine names we run the Hadoop cluster are fanbinx1,fanbinx2,fanbinx3. Where fanbinx1 as the master node, fanbinx2 and fanbinx3 as slave nodes.

In addition to our HADOOP 2.5.1 installation package installed into the/opt/hadoop directory of each machine, in order to illustrate the convenience we use here $hadoop_home to replace the/opt/hadoop directory
and create the following three directories under this directory

Mkdir-p $HADOOP _home/dfs/name
mkdir-p $HADOOP _home/dfs/data
mkdir-p $HADOOP _home/temp

2. Configure
Here are a few of the following configuration files and script files to change Hadoop

$HADOOP _home/etc/hadoop/hadoop-env.sh
$HADOOP _home/etc/hadoop/yarn-env.sh
$HADOOP _home/etc/hadoop/ Core-site.xml
$HADOOP _home/etc/hadoop/hdfs-site.xml
$HADOOP _home/etc/hadoop/mapred-site.xml
$ Hadoop_home/etc/hadoop/yarn-site.xml
$HADOOP _home/etc/hadoop/slaves

2.1 $HADOOP _home/etc/hadoop/hadoop-env.sh Specify Java_home environment variables

Export JAVA_HOME=/OPT/JDK7

2.2 $HADOOP _home/etc/hadoop/yarn-env.sh Specify JAVA_HOME environment variables

Export JAVA_HOME=/OPT/JDK7

2.3 $HADOOP _home/etc/hadoop/core-site.xml

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value> hdfs://fanbinx1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir </name>
        <value>/opt/hadoop/temp</value>
    </property>
</configuration >

2.4 $HADOOP _home/etc/hadoop/hdfs-site.xml

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value> 3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/dfs/name</value>
    </property>
    <property>
        <name >dfs.datanode.data.dir</name>
        <value>/opt/hadoop/dfs/data</value>
    </property >
</configuration>

2.5 $HADOOP _home/etc/hadoop/mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
< configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value> yarn</value>
    </property>
</configuration>

2.6 $HADOOP _home/etc/hadoop/yarn-site.xml

<?xml version= "1.0"?> <configuration> <property> &LT;NAME&GT;YARN.NODEMANAGER.AUX-SERVICES&L t;/name> <value>mapreduce_shuffle</value> </property> <property> <na Me>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value> Org.apache.hadoop.mapred.shufflehandler</value> </property> <property> <name>yarn. resourcemanager.address</name> <value>fanbinx1:8032</value> </property> <proper Ty> <name>yarn.resourcemanager.scheduler.address</name> <value>fanbinx1:8030</valu e> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</nam e> <value>fanbinx1:8031</value> </property> <property> <name>yarn
        .resourcemanager.admin.address</name><value>fanbinx1:8033</value> </property> <property> <name>yarn.resourcemanag er.webapp.address</name> <value>fanbinx1:8088</value> </property> &LT;/CONFIGURATION&G T

2.7 $HADOOP _home/etc/hadoop/slaves This folder is used to define slave nodes

Fanbinx2
Fanbinx3

2.8 Finally, the configuration files need to be replicated to the other two slave nodes.

3. Set up Linux can ssh user can be password-free login

$ ssh-keygen-t dsa-p ' F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4. Start Hadoop cluster
4.1 First Format Namenode

$ Bin/hdfs Namenode-format

4.2 Start HDFs
Run the following command on the master machine

$ sbin/start-dfs.sh

Run on Master Machine Ps-ef | grep Hadoop "can view two Hadoop processes to Namenode and Secondarynamenode
Running on the slave machine "Ps-ef | grep Hadoop "can be viewed to datanode a Hadoop process

4.3 Start yarn
Run the following command on the master machine

$ sbin/start-yarn.sh

Run on Master Machine Ps-ef | grep Hadoop "can view three Hadoop processes to Namenode,secondarynamenode and ResourceManager
Running on the slave machine "Ps-ef | grep Hadoop "can view two Hadoop processes to Datanode and NodeManager

4.4 Verification
After you start HDFs and yarn, you can view the status through the two URLs of the project
View hdfs:http://fanbin1:50070/
View rm:http://fanbin1:8088/cluster/

You can also use the following command line to view the cluster status

$ Bin/hdfs Dfsadmin-report

4.5 This can also make him "sbin/start-all.sh" and "sbin/stop-all.sh" instead of starting/stopping HDFs and yarn two services.

5. Run the sample program
Submit Job First

$ Bin/hdfs dfs-mkdir/user
$ bin/hdfs dfs-mkdir/user/<username>
$ bin/hdfs dfs-put etc/hadoop input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input Output ' dfs[a-z.] +'

View Results

$ bin/hdfs dfs-get output output
$ cat output/*


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.