Detailed tutorial on Hadoop 2.0 cluster configuration [virtual machine configuration succeeded]

Source: Internet
Author: User

1) machine preparation:

There are a total of four physical machines. to configure a Hadoop cluster based on the physical machine, there are four nodes: one Master and three Salve nodes,

Nodes are connected to each other through a LAN. The IP addresses can be pinged to the following IP addresses:
192.168.216.131 hadoop1
192.168.216.132 hadoop2
192.168.216.20.hadoop3
192.168.216.134 hadoop4

The operating system is CentOS6.2 64bit.
The Master machine is mainly configured with NameNode and JobTracker roles, responsible for managing the execution of distributed data and decomposition tasks;

The three Salve machines configure the DataNode and TaskTracker roles to be responsible for Distributed Data Storage and task execution.

In fact, there should also be a Master machine used as a backup to prevent the Master server from being down, and another backup should be enabled immediately.

After a certain period of experience, you can add a backup Master machine.


2) create an account

After logging on to all machines with root, all machines create hadoop users.
Useradd hadoop
Passwd hadoop

In this case, a hadoop directory is generated under/home/. The directory path is/home/hadoop.

Create related directories

Define the storage path for data and directories

Define the path for storing code and tools
Mkdir-p/home/hadoop/source
Mkdir-p/home/hadoop/tools

Define the path where data nodes are stored to the hadoop folder under the directory. There is enough space to store the directory where data nodes are stored.
Mkdir-p/hadoop/hdfs
Mkdir-p/hadoop/tmp
Mkdir-p/hadoop/log
Set write permission
Chmod-R 777/hadoop

Define java installer path
Mkdir-p/usr/java

 


3) install JDK

Download jdk1.7 (x64) from the official website)

Unzip # tar zvxf jdk-7u10-linux-x64.tar.gz

# Mv jdk1.7.0 _ 10/usr/java/jdk1.7

Configure environment variables, execute the cd/etc command, execute vi profile, and add

Export JAVA_HOME =/usr/java/jdk1.7export

CLASSPATH =.: $ JAVA_HOME/lib/tools. jar: $ JAVA_HOME/lib/dt. jar

Export PATH = $ PATH: $ JAVA_HOME/bin:

Execute chmod + x profile to convert it into an executable file
Execute source profile to make the configuration take effect immediately
# Source/etc/profile
Run java-version to check whether the installation is successful.

In this step, all machines must be installed

 


4) modify the Host Name

Modify the host name. All nodes are configured the same.
1. Connect to the master node 192.168.216.131, modify the network, execute vim/etc/sysconfig/network, and modify HOSTNAME = hadoop1
2. Modify the hosts file, execute the cd/etc command, execute vi hosts, and add at the end of the row:

192.168.216.131 hadoop1
192.168.216.132 hadoop2
192.168.216.20.hadoop3
192.168.216.134 hadoop4

3. Run hostname hadoop1.
4. Execute exit and reconnect. You can see the host name to modify OK.

Other nodes also modify the Host Name and add the host. Alternatively, the Host file can be overwritten by scp.


5) Configure SSH login without a password

SSH password-less principle:
First, a key pair is generated on hadoop1, including a public key and a private key, and the Public Key is copied to all slave (hadoop2-hadoop4.
Then, when the master connects to the slave through SSH, the slave will generate a random number and encrypt the random number with the master's public key and send it to the master.
After the master receives the number of encrypted data, it decrypts it with the private key and returns the number of decrypted data to slave. After the slave confirms that the number of decrypted data is correct, the master is allowed not to enter the password for connection.

2. Specific steps (executed when the root user and hadoop user log on)
1. Run the ssh-keygen-t rsa command and press enter to check the generated password-less key pair: cd. ssh and then run ll
2. append id_rsa.pub to the authorization key. Run the command cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys
3. Modify permissions: Execute chmod 600 ~ /. Ssh/authorized_keys
4. Make sure the following content exists in cat/etc/ssh/sshd_config:

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile. ssh/authorized_keys
If you need to modify it, run the restart SSH service command after modification to make it take effect: service sshd restart

Copy the public key to all slave machines: scp ~ /. Ssh/id_rsa.pub192.168.216.133 :~ /Then enter yes and the password of the slave machine;
Create a. ssh folder on the slave machine: mkdir ~ /. Ssh then run chmod 700 ~ /. Ssh (you do not need to create a folder if it already exists );
Append to the authorization file authorized_keys. Execute the command: cat ~ /Id_rsa.pub> ~ /. Ssh/authorized_keys and then execute chmod 600 ~ /. Ssh/authorized_keys
Repeat Step 1
Verification command: Execute ssh 192.168.216.133 on the master machine and find that the host name is changed from hadoop1 to hadoop3,

Finally, delete the id_rsa.pub file: rm-r id_rsa.pub (this step is not required. do not conflict with the local public key file)
Configure hadoop1, hadoop2, hadoop3, and hadoop4 respectively according to the preceding steps. Each user must be logged on without a password.

6) download source code

HADOOP version
The latest hadoop-2.0.0-alpha installation package is hadoop-2.0.0-alpha.tar.gz
Download official website address: http://www.apache.org/dyn/closer.cgi/hadoop/common/
Decompress the directory in the/usr/local directory
Tar zxvf hadoop-2.0.0-alpha.tar.gz

Music hadoop-2.0.0-alpha hadoop

 


Source code Configuration Modification

/Etc/profile

Configure the environment variable vim/etc/profile
Add
Export HADOOP_DEV_HOME =/usr/local/hadoop # hadoop installation directory
Export PATH = $ PATH: $ HADOOP_DEV_HOME/bin
Export PATH = $ PATH: $ HADOOP_DEV_HOME/sbin
Export HADOOP_MAPARED_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_COMMON_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_HDFS_HOME =$ {HADOOP_DEV_HOME}
Export YARN_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop
Export HDFS_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop
Export YARN_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop

7) configuration file
Configure hadoop-env.sh

Vim/usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop/hadoop-env.sh.
Add exportJAVA_HOME =/usr/java/jdk1.7 at the end

Core-site.xml

Add attributes to the configuration Node

<Property>
<Name> hadoop. tmp. dir </name>
<Value>/hadoop/tmp </value>
<Description> A base for other temporary directories. </description>
</Property>
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // 192.168.216.131: 9000 </value>
</Property>


Slave Configuration

Vim/home/hadoop/etc/hadoop/slaves
Add an slave IP Address
192.168.216.131
192.168.216.132
192.168.216.20.


Configure hdfs-site.xml

Vim/home/hadoop/etc/hadoop/hdfs-site.xml.
Add Node

<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
</Property>

<Property>
<Name> dfs. namenode. name. dir </name>
<Value> file:/hadoop/hdfs/name </value>
<Final> true </final>
</Property>

<Property>
<Name> dfs. federation. nameservice. id </name>
<Value> ns1 </value>
</Property>

<Property>
<Name> dfs. namenode. backup. address. ns1 </name>
<Value> 192.168.216.131: 50100 </value>
</Property>

<Property>
<Name> dfs. namenode. backup. http-address.ns1 </name>
<Value> 192.168.216.131: 50105 </value>
</Property>

<Property>
<Name> dfs. federation. nameservices </name>
<Value> ns1 </value>
</Property>

<Property>
<Name> dfs. namenode. rpc-address.ns1 </name>
<Value> 192.168.216.131: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns2 </name>
<Value> 192.168.216.131: 9000 </value>
</Property>

<Property>
<Name> dfs. namenode. http-address.ns1 </name>
<Value> 192.168.216.131: 23001 </value>
</Property>

<Property>
<Name> dfs. namenode. http-address.ns2 </name>
<Value> 192.168.216.131: 13001 </value>
</Property>

<Property>
<Name> dfs. dataname. data. dir </name>
<Value> file:/hadoop/hdfs/data </value>
<Final> true </final>
</Property>

<Property>
<Name> dfs. namenode. secondary. http-address.ns1 </name>
<Value> 192.168.216.131: 23002 </value>
</Property>

<Property>
<Name> dfs. namenode. secondary. http-address.ns2 </name>
<Value> 192.168.216.131: 23002 </value>
</Property>

<Property>
<Name> dfs. namenode. secondary. http-address.ns1 </name>
<Value> 192.168.216.131: 23003 </value>
</Property>

<Property>
<Name> dfs. namenode. secondary. http-address.ns2 </name>
<Value> 192.168.216.131: 23003 </value>
</Property>


Configure yarn-site.xml

Add Node

<Property>
<Name> yarn. resourcemanager. address </name>
<Value> 192.168.216.131: 18040 </value>
</Property>

<Property>
<Name> yarn. resourcemanager. schedager. address </name>
<Value> 192.168.216.131: 18030 </value>
</Property>

<Property>
<Name> yarn. resourcemanager. webapp. address </name>
<Value> 192.168.216.131: 18088 </value>
</Property>

<Property>
<Name> yarn. resourcemanager. resource-tracker.address </name>
<Value> 192.168.216.131: 18025 </value>
</Property>

<Property>
<Name> yarn. resourcemanager. admin. address </name>
<Value> 192.168.216.131: 18141 </value>
</Property>

<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce. shuffle </value>
</Property>

<Property>

<Name> yarn. nodemanager. aux-services.mapreduce.shuffle.class </name>

<Value> org. apache. hadoop. mapred. ShuffleHandler </value>

</Property>

Create a file mapred-site.xml with the following content in/etc/hadoop

<Property>

<Name> mapreduce. framework. name </name>

<Value> yarn </value>

</Property>

<Property>


<Name> mapred. system. dir </name>

<Value> file:/hadoop/mapred/system </value>

<Final> true </final>

</Property>

<Property>

<Name> mapred. local. dir </name>

<Value> file:/hadoop/mapred/local </value>

<Final> true </final>

</Property>


Configure httpfs-site.xml

Add httpfs options
<Property>
<Name> hadoop. proxyuser. root. hosts </name>
<Value> 192.168.216.131 </value>
</Property>
<Property>
<Name> hadoop. proxyuser. root. groups </name>
<Value> * </value>
</Property>

 

Synchronize code to other machines

1. Synchronization configuration code
First, create a server on the server Load balancer instance.
Mkdir-p/usr/local
Deploy the hadoop code and synchronize the modified configuration files under etc/hadoop, for example, hadoop1 # scp-r hadoop root @ hadoop2:/usr/local/
2. Synchronize/etc/profile
3. Synchronize/etc/hosts
# Scp-r/etc/profile root @ hadoop2:/etc/profile
# Scp-r/etc/hosts root @ hadoop2:/etc/hosts

Other machines perform this operation


8) Hadoop startup

Format a cluster

Hadoop namenode-format-clusterid clustername

Hsfs namenode-format

 

Start hdfs execution:
Start-dfs.sh

Enable the hadoop dfs service;
Start Yarn

Enable yarn Resource Management Service:
Start-yarn.sh;

Start httpfs

Enable the httpfs Service
Httpfs. sh start
To improve the http restful interface service


9) test

Verification of installation results

Verify hdfs

Run jps on each machine to check whether the process has been started.

[Root @ hadoop1 hadoop] # jps
7396 NameNode
24834 Bootstrap
7594 SecondaryNameNode
7681 ResourceManager
Jps 32261

[Root @ hadoop2 ~] # Jps
Jps 8966
31822 DataNode
31935 NodeManager

Process started normally

Verify if you can log on
Hadoop fs-ls hdfs: // 192.168.216.131: 9000/
Hadoop fs-mkdir hdfs:/192.168.216.131: 9000/testfolder
Hadoop fs-copyFromLocal./xxxx hdfs: // 192.168.216.131: 9000/testfolder
Hadoop fs-ls hdfs: // 192.168.216.131: 9000/testfolder

Check whether the preceding execution is normal.


Verify map/reduce

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.