Three machines build Hadoop clusters

Source: Internet
Author: User
Tags ssh

Target: one master two units slave


First, modify the hostname, configure the/etc/hosts file

The "/etc/hosts" file is used to configure the DNS server information that the host will use, which is the corresponding [HostName IP] for each host that is recorded in the LAN. When the user is in the network connection, first look for the file, look for the corresponding host name corresponding IP address. Add the following in the/etc/hosts file:


You can ping it to see if you can communicate properly. All machines are to be modified.


Two or three machines are SSH password-free login.

The remote Hadoop daemon needs to be managed during Hadoop operation, and after Hadoop is started, Namenode starts and stops various daemons on each datanode through SSH (Secure Shell). This must be executed between the nodes when the command is not required to enter the form of a password, we need to configure SSH to use the form of non-password public key authentication, so that namenode use SSH without password login and start the dataname process, the same principle, Datanode can also log on to NameNode using SSH without a password.

Note that it is also necessary to add the local public key to the Authorized_keys, via SSH localhost authentication


Third, install the Java environment

The JDK is installed on all machines and is now installed on the master server, and the other servers follow the steps to repeat. Installing the JDK and configuring the environment variables needs to be done as "root".

Use Java-version to verify that the installation is successful. Note that the master and slave machines must have the same version of the JDK installed, or there will be a problem.



Iv. Installing the Hadoop cluster

1, first install Hadoop on the master machine, unzip the downloaded tar.gz package on it. Modify Directory name to Hadoop

2. Create the folder TMP under the Hadoop directory. and add the installation path of Hadoop to "/etc/profile", modify the "/etc/profile" file, add the following statement to the end , and make it effective (./etc/profile):

Here are the JDK and HADOOP environment variable configurations:

# Java ENV
export java_home=/opt/java/jdk1.7.0_80
export JRE_HOME=/OPT/JAVA/JDK1.7.0_80/JRE
Export Classpath=.: $JAVA _home/lib/tools.jar: $JAVA _home/lib/dt.jar: $JRE _home/lib
export path= $PATH: $JAVA _home/bin:$ Jre_home/bin

# Hadoop ENV
export hadoop_home=/opt/hadoop-2.6.4
export path= $PATH: $HADOOP _home/bin:$ Hadoop_home/sbin


3, configure Hadoop, configuration files under the hadoop/etc/hadoop/directory.

First, modify the Hadoop configuration on the master machine

(1) hadoop-env.sh

Add the following two lines of configuration:

Export java_home=/opt/java/jdk1.7.0_80
Export hadoop_prefix=/opt/hadoop-2.6.4

(2) Core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs:// master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name >
        <value>/opt/hadoop-2.6.4/tmp</value>
    </property>
</configuration>

Note: The TMP directory needs to be created in advance

(3) Hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3 </value>
    </property>
</configuration>

There are three copies of the data

(4) Mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        < value>yarn</value>
    </property>
</configuration>

(5) yarn-env.sh

Add Java_home Configuration

Export java_home=/opt/java/jdk1.7.0_80

(6) Yarn-site.xml

<configuration>

<!--Site specific YARN Configuration Properties---
    <property>
        < name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </ property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value> master</value>
    </property>
</configuration>

(7) Slaves

Master
slave01
slave02

Master is also known as NameNode as a DataNode. 4, do the same configuration on slave01 and slave02 directly copy the Hadoop folder to the slave machine.



V. Start the Hadoop cluster

1. Format File System

Execute the following command on master:

$ Hadoop/bin/hdfs Namenode-format

After executing the console output as shown below, see exiting with status 0 for formatting success.


2. Start NameNode and Datenode

Execute the start-dfs.sh on the master machine as follows:

Use the JPS command to view the Java process on master:

Use the JPS command to view the Java processes on the SLAVE01 and SLAVE02 respectively:

You can see that both NameNode and DataNode have started successfully.
3. View NameNode and NameNode information

Browser input Address: http://master:50070/can view NameNode information.

4. Start ResourceManager and NodeManager

Run the start-yarn.sh as follows:

Use JPS to view Java processes on Master

You can see that both ResourceManager and NodeManager on master have started successfully.

You can see that slave01 on NodeManager also started successfully.

You can also see that slave02 on NodeManager has been successfully started.

So far, the entire Hadoop cluster has started.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.