Construction of pseudo-distributed cluster environment for Hadoop 2.2.0

Last Update:2015-12-23 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The "three-step" process of the Hadoop pseudo-distribution environment

First, JDK installation and environment variable configuration

1, test first, whether the JDK is installed

Java-version

2. View the number of CentOS positions

File/bin/ls

3. Switch to usr/, create java/directory

CD/

CD usr/

mkdir Java

CD java/

4, upload local download good, show upload command is not installed

5, download RZ, sz command

Yum-y Install Lrzsz

6, upload the local download good jdk-7u79-linux-x64.tar.gz

7, check whether, last success! Jdk-7u79-linux-x64.tar.gz

8, the installation package jdk-7u79-linux-x64.tar.gz, decompression.

Tar zxvf jdk-7u79-linux-x64.tar.gz

Decompression Complete!

9, the JDK environment variable configuration

Cd/etc/profile.d

VI java.sh

java_home=/usr/java/jdk1.7.0_79

Path= $JAVA _home/bin: $PATH

Classpath= $JAVA _home/lib: $CLASSPATH

Export Java_home PATH CLASSPATH

10. Use source to make the new JDK environment variable configuration file effective.

source/etc/profile.d/java.sh

11, configure the Hosts file, add the relationship between hostname and IP, convenient for future visits.

12, first, ifconfig view IP

13, Vi/etc/hosts, to modify the Hosts file

To: Wq save exit.

14. Prepare Hadoop dedicated users and groups

Groupadd Hadoop This is the creation of a Hadoop user group

Useradd-g Hadoop Hadoop This is a new Hadoop user and added to the Hadoop group

passwd Hadoop Hadoop user password for Hadoop

15, because Hadoop uses a lot of ports, it is recommended to turn off the firewall to avoid unnecessary problems, the production environment can be the corresponding port security control. Need to shut down the firewall first

Centos6.5 comes with a firewall that is iptables firewall, no firewall firewall.

Service Iptables Status

16. Then you need to use a command to turn off the firewall. There are two ways of doing this:

Chkconfig iptables off This method is permanently closed and will not be restored after the system restarts.

Service Iptables Stop This method is immediate and will be restored after the system restarts.

Second, SSH configuration

1 Configure SSH to implement no password authentication configuration, first switch to the Hadoop user you just created.

Since Hadoop requires no password login as Datanode node, and because the current node is both Namenode and datanode because of the deployment of a single node, SSH login with no password is required at this time. Here's how:

Su Hadoop

2. Create the. SSH directory, generate the key

mkdir. SSH

SSH-KEYGEN-T RSA

3. Switch to the. SSH directory to view the public and private keys

CD. SSH

4. Copy the public key into the log file. To see if replication succeeded

CP Id_rsa.pub Authorized_keys

5. View the contents of the diary file

VI Authorized_keys

6, back to/home/hadoop/, to give permission

Cd..

chmod. SSH assigns the. SSH folder permissions to 700

chmod. ssh/* gives 600 permissions to files inside the. SSH folder

7, switch to the root user, install the SSH plugin

Su Root

Yum-y Install Openssh-clients

8, switch to/home/hadoop/, test ssh without password access,

Su Hadoop

SSH djt002

Yes

Third, the Hadoop environment variable configuration

Build a Hadoop pseudo-distributed environment

Download and unzip the Hadoop2.2.0, put the files in the/usr/java/directory,注意：下载前确认一下你的Linux系统是64位系统还是32位系统，分别下载对应的版本，如果下载错了，后面会有很多问题

1 First, switch to/usr/java/, then switch to root user, and then/root/java

Cd/usr/java

Su Root

2, using the wget command installation, not installed?

Yum-y Install wget

3, with wget command online download hadoop-2.2.0-x64.tar.gz

wget http://hadoop.f.dajiangtai.com/hadoop2.2/hadoop-2.2.0-x64.tar.gz

or 3 Use the RZ command to upload hadoop-2.2.0-x64.tar.gz from Windows to/usr/java/.

4. View, Unzip,

Tar axvf hadoop-2.2.0-x64.tar.gz

5. Change the file name hadoop-2.2.0 to Hadoop

MV hadoop-2.2.0 Hadoop

6, the newly renamed Hadoop file, permissions to the Hadoop user

Chown-r Hadoop:hadoop Hadoop

7 Create the Hadoop data directory and assign the entire data directory permissions to the Hadoop user

Mkdir-p/data/dfs/name

Mkdir-p/data/dfs/data

Mkdir-p/data/tmp

Chown-r Hadoop:hadoop Hadoop/data

8. Modify the configuration file for Hadoop

Switch to Hadoop user, switch to Hadoop directory

Su Hadoop

CD Hadoop

9. View the configuration file under Etc/hadoop

CD etc/

CD hadoop/

10, modify the Etc/hadoop/core-site.xml configuration file, add the following information

VI Core-site.xml

Here is the address and port number of the Distributed File System for HDFs (Hadoop)

<name>fs.defaultFS</name>

</property>

The following configuration is the common directory where the HDFs path holds data

<name>hadoop.tmp.dir</name>

</property>

The following configuration is, Because of the security mechanism introduced in hadoop1.0, the job submitter from the client becomes Hadoop, regardless of the original submitter's user, in order to solve the problem, introduce security violation function, allow a super user to replace other users to submit the job or execute the command, and externally, the performer is still a normal user. So

Configuration set to any client

<name>hadoop.proxyuser.hadoop.hosts</name>

</property>

Configuration set to any user group

<name>hadoop.proxyuser.hadoop.groups</name>

</property>

That is, the total is as follows:

Modify the Etc/hadoop/hdfs-site.xml configuration file to add the following information.

VI Hdfs-site.xml

below this is configured to be Namenode file directory

<name>dfs.namenode.name.dir</name>

</property>

below this is configured to be Datanode file directory

<name>dfs.datanode.data.dir</name>

</property>

The following configuration is a copy of the block and HDFs permissions

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

The total is as follows:

12, modify the Etc/hadoop/mapred-site.xml configuration file, add the following information.

CP Mapred-site.xml.template Mapred-site.xml

VI Mapred-site.xml

A mapreduce environment is configured for yarn

<name>mapreduce.framework.name</name>

</property>

13, modify the Etc/hadoop/yarn-site.xml configuration file, add the following information.

VI Yarn-site.xml

in order to be able to run MapReduce program, we need to get . Nodemanger Load at startup Shuffle . So the following settings are required

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

14, modify the Etc/hadoop/slaves, add the following information. That is, slaves file

VI Slaves

is now a pseudo-distributed single-node cluster, so Datanode and the Namedata On a single node .

Revision changed to

15. Setting the Hadoop environment variable

Switch to root user, modify the/etc/profile file

Su Root

Vi/etc/profile

Not hadoop_home=/usr/java/hadoop-2.2.0, because the first name changed.

Hadoop_home=/usr/java/hadoop

Path= $HADOOP _home/bin: $PATH

Export Hadoop_home PATH

Summary: The JDK environment variable configuration is placed in the/ETC/PROFILE.D Hadoop environment variable configuration is placed on the/etc/profile

16. Make the configuration file effective

Source/etc/profile

Fourth Test Hadoop run

1 Switch to the Hadoop user and go back to the Hadoop directory.

Su Hadoop

Cd..

2 Formatting Namenode

Bin/hadoop Namenode-format

3 Starting the cluster

sbin/start-all.sh

4 Viewing the cluster process

JPs

5 Administrator Run Notepad

6 Local Hosts file

Then, save, and then close.

7 Finally, it is time to verify that Hadoop is installed successfully.

On Windows, you can access WebUI through http://djt002:50070 to view the status of Namenode, the cluster, and the file system. This is the Web page for HDFs.

http://djt002:50070

8 new Djt.txt, used for testing. Test with the WordCount program that comes with Hadoop

Then, save the exit,

9 View the file directory of HDFs.

Hadoop Fs-ls/

Hadoop Fs-mkdir/dajiangtai

Hadoop Fs-ls/

Hadoop Fs-put/usr/java/hadoop/djt.txt/dajiangtai

Hadoop Fs-ls/dajiangtai

10 Open a Web page to see the job's dynamics

Http://djt002:8088/cluster/apps

11 Perform Hadoop-brought WordCount program to test run-down

Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar Wordcount/dajiangtai/djt.txt/dajiangtai /wordcount-out

As an explanation,/dajiangtai/djt.txt is the input path, and/dajiangtai/wordcount-out is the output path.

If all of the above are OK, then congratulations, your Hadoop single node pseudo-distributed operating environment has been built successfully!

Construction of pseudo-distributed cluster environment for Hadoop 2.2.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More