The distributed model of Hadoop in actual combat

Last Update:2018-08-11 Source: Internet

Author: User

Tags mkdir xsl hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Installation
Here we assume that the three machine names we run the Hadoop cluster are fanbinx1,fanbinx2,fanbinx3. Where fanbinx1 as the master node, fanbinx2 and fanbinx3 as slave nodes.

In addition to our HADOOP 2.5.1 installation package installed into the/opt/hadoop directory of each machine, in order to illustrate the convenience we use here $hadoop_home to replace the/opt/hadoop directory
and create the following three directories under this directory

Mkdir-p $HADOOP _home/dfs/name
mkdir-p $HADOOP _home/dfs/data
mkdir-p $HADOOP _home/temp

2. Configure
Here are a few of the following configuration files and script files to change Hadoop

$HADOOP _home/etc/hadoop/hadoop-env.sh
$HADOOP _home/etc/hadoop/yarn-env.sh
$HADOOP _home/etc/hadoop/ Core-site.xml
$HADOOP _home/etc/hadoop/hdfs-site.xml
$HADOOP _home/etc/hadoop/mapred-site.xml
$ Hadoop_home/etc/hadoop/yarn-site.xml
$HADOOP _home/etc/hadoop/slaves

2.1 $HADOOP _home/etc/hadoop/hadoop-env.sh Specify Java_home environment variables

Export JAVA_HOME=/OPT/JDK7

2.2 $HADOOP _home/etc/hadoop/yarn-env.sh Specify JAVA_HOME environment variables

Export JAVA_HOME=/OPT/JDK7

2.3 $HADOOP _home/etc/hadoop/core-site.xml

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value> hdfs://fanbinx1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir </name>
        <value>/opt/hadoop/temp</value>
    </property>
</configuration >

2.4 $HADOOP _home/etc/hadoop/hdfs-site.xml

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "
text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value> 3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/dfs/name</value>
    </property>
    <property>
        <name >dfs.datanode.data.dir</name>
        <value>/opt/hadoop/dfs/data</value>
    </property >
</configuration>

2.5 $HADOOP _home/etc/hadoop/mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
< configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value> yarn</value>
    </property>
</configuration>

2.6 $HADOOP _home/etc/hadoop/yarn-site.xml

<?xml version= "1.0"?> <configuration> <property> &LT;NAME&GT;YARN.NODEMANAGER.AUX-SERVICES&L t;/name> <value>mapreduce_shuffle</value> </property> <property> <na Me>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value> Org.apache.hadoop.mapred.shufflehandler</value> </property> <property> <name>yarn. resourcemanager.address</name> <value>fanbinx1:8032</value> </property> <proper Ty> <name>yarn.resourcemanager.scheduler.address</name> <value>fanbinx1:8030</valu e> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</nam e> <value>fanbinx1:8031</value> </property> <property> <name>yarn
        .resourcemanager.admin.address</name><value>fanbinx1:8033</value> </property> <property> <name>yarn.resourcemanag er.webapp.address</name> <value>fanbinx1:8088</value> </property> &LT;/CONFIGURATION&G T

2.7 $HADOOP _home/etc/hadoop/slaves This folder is used to define slave nodes

Fanbinx2
Fanbinx3

2.8 Finally, the configuration files need to be replicated to the other two slave nodes.

3. Set up Linux can ssh user can be password-free login

$ ssh-keygen-t dsa-p ' F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4. Start Hadoop cluster
4.1 First Format Namenode

$ Bin/hdfs Namenode-format

4.2 Start HDFs
Run the following command on the master machine

$ sbin/start-dfs.sh

Run on Master Machine Ps-ef | grep Hadoop "can view two Hadoop processes to Namenode and Secondarynamenode
Running on the slave machine "Ps-ef | grep Hadoop "can be viewed to datanode a Hadoop process

4.3 Start yarn
Run the following command on the master machine

$ sbin/start-yarn.sh

Run on Master Machine Ps-ef | grep Hadoop "can view three Hadoop processes to Namenode,secondarynamenode and ResourceManager
Running on the slave machine "Ps-ef | grep Hadoop "can view two Hadoop processes to Datanode and NodeManager

4.4 Verification
After you start HDFs and yarn, you can view the status through the two URLs of the project
View hdfs:http://fanbin1:50070/
View rm:http://fanbin1:8088/cluster/

You can also use the following command line to view the cluster status

$ Bin/hdfs Dfsadmin-report

4.5 This can also make him "sbin/start-all.sh" and "sbin/stop-all.sh" instead of starting/stopping HDFs and yarn two services.

5. Run the sample program
Submit Job First

$ Bin/hdfs dfs-mkdir/user
$ bin/hdfs dfs-mkdir/user/<username>
$ bin/hdfs dfs-put etc/hadoop input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input Output ' dfs[a-z.] +'

View Results

$ bin/hdfs dfs-get output output
$ cat output/*

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More