Three machines build Hadoop clusters

Last Update:2018-07-26 Source: Internet

Author: User

Tags ssh

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Target: one master two units slave

First, modify the hostname, configure the/etc/hosts file

The "/etc/hosts" file is used to configure the DNS server information that the host will use, which is the corresponding [HostName IP] for each host that is recorded in the LAN. When the user is in the network connection, first look for the file, look for the corresponding host name corresponding IP address. Add the following in the/etc/hosts file:

You can ping it to see if you can communicate properly. All machines are to be modified.

Two or three machines are SSH password-free login.

The remote Hadoop daemon needs to be managed during Hadoop operation, and after Hadoop is started, Namenode starts and stops various daemons on each datanode through SSH (Secure Shell). This must be executed between the nodes when the command is not required to enter the form of a password, we need to configure SSH to use the form of non-password public key authentication, so that namenode use SSH without password login and start the dataname process, the same principle, Datanode can also log on to NameNode using SSH without a password.

Note that it is also necessary to add the local public key to the Authorized_keys, via SSH localhost authentication

Third, install the Java environment

The JDK is installed on all machines and is now installed on the master server, and the other servers follow the steps to repeat. Installing the JDK and configuring the environment variables needs to be done as "root".

Use Java-version to verify that the installation is successful. Note that the master and slave machines must have the same version of the JDK installed, or there will be a problem.

Iv. Installing the Hadoop cluster

1, first install Hadoop on the master machine, unzip the downloaded tar.gz package on it. Modify Directory name to Hadoop

2. Create the folder TMP under the Hadoop directory. and add the installation path of Hadoop to "/etc/profile", modify the "/etc/profile" file, add the following statement to the end , and make it effective (./etc/profile):

Here are the JDK and HADOOP environment variable configurations:

# Java ENV
export java_home=/opt/java/jdk1.7.0_80
export JRE_HOME=/OPT/JAVA/JDK1.7.0_80/JRE
Export Classpath=.: $JAVA _home/lib/tools.jar: $JAVA _home/lib/dt.jar: $JRE _home/lib
export path= $PATH: $JAVA _home/bin:$ Jre_home/bin

# Hadoop ENV
export hadoop_home=/opt/hadoop-2.6.4
export path= $PATH: $HADOOP _home/bin:$ Hadoop_home/sbin

3, configure Hadoop, configuration files under the hadoop/etc/hadoop/directory.

First, modify the Hadoop configuration on the master machine

(1) hadoop-env.sh

Add the following two lines of configuration:

Export java_home=/opt/java/jdk1.7.0_80
Export hadoop_prefix=/opt/hadoop-2.6.4

(2) Core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs:// master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name >
        <value>/opt/hadoop-2.6.4/tmp</value>
    </property>
</configuration>

Note: The TMP directory needs to be created in advance

(3) Hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3 </value>
    </property>
</configuration>

There are three copies of the data

(4) Mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        < value>yarn</value>
    </property>
</configuration>

(5) yarn-env.sh

Add Java_home Configuration

Export java_home=/opt/java/jdk1.7.0_80

(6) Yarn-site.xml

<configuration>

<!--Site specific YARN Configuration Properties---
    <property>
        < name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </ property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value> master</value>
    </property>
</configuration>

(7) Slaves

Master
slave01
slave02

Master is also known as NameNode as a DataNode. 4, do the same configuration on slave01 and slave02 directly copy the Hadoop folder to the slave machine.

V. Start the Hadoop cluster

1. Format File System

Execute the following command on master:

$ Hadoop/bin/hdfs Namenode-format

After executing the console output as shown below, see exiting with status 0 for formatting success.

2. Start NameNode and Datenode

Execute the start-dfs.sh on the master machine as follows:

Use the JPS command to view the Java process on master:

Use the JPS command to view the Java processes on the SLAVE01 and SLAVE02 respectively:

You can see that both NameNode and DataNode have started successfully.
3. View NameNode and NameNode information

Browser input Address: http://master:50070/can view NameNode information.

4. Start ResourceManager and NodeManager

Run the start-yarn.sh as follows:

Use JPS to view Java processes on Master

You can see that both ResourceManager and NodeManager on master have started successfully.

You can see that slave01 on NodeManager also started successfully.

You can also see that slave02 on NodeManager has been successfully started.

So far, the entire Hadoop cluster has started.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More