Full distribution mode: Install the first node in one of the hadoop cluster configurations

Last Update:2018-12-08 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This series of articles describes how to install and configure hadoop in full distribution mode and some basic operations in full distribution mode. Prepare to use a single-host call before joining the node. This article only describes how to install and configure a single node.

1. Install Namenode and JobTracker

This is the first and most critical cluster in full distribution mode. Use VMWARE virtual Ubuntu Linux 11.10 server. This article does not focus on Linux installation. By default, a user named abc has the sudo permission. The root password is a random password and can only be temporarily promoted to the root permission using the sudo command. To be safe, the first thing to do after installing the system is to change the root password. With sudo passwd root, the system will not ask you for the original password, just enter the new password twice. The root password is in the hand, so you won't be helpless in case of Operation errors.

1.1 install JDK

There is a command to install jdk quickly, sudo apt-get install sun-java6-jdk, which is a mechanism of the ubuntu system itself. I tried it and did not succeed. Remember to say that the package could not be found. I don't know why, or the network doesn't work, or the package name is incorrect. Give up. You have to find another method.

Go to the Oracle website and find the latest JDK 1.6.0 _ 31 in JDK 1.6. The download link is as follows:

This is because it is a 32-bit JDK, and Ubuntu Linux selects a 32-bit JDK. JDK also selects 32 bits. Click "Accept License Agreement", right-click to "jdk-6u31-linux-i586.bin", get its link in properties: http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin, then return to the virtual machine, log on with abc, enter the command:

Wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin

This download takes some time. After the download is complete, there will be a jdk-6u31-linux-i586.bin file under/home/abc.

Sudo mkdir/usr/lib/jvm

Cd/usr/lib/jvm

Sudo mkdir java

Cd java

Sudo cp/home/abc/jdk-6u31-linux-i586.bin.

Sudo chmod 777 jdk-6u31-linux-i586.bin

/Jdk-6u31-linux-i586.bin

Then, install jdk. It will be installed later.

Sudo vi/etc/environment

Modify the file as follows:

Add/usr/lib/jvm/java/jdk1.6.0 _ 31/bin after the line of PATH. Note that the colon Before/usr is required.

Add these two lines:

CLASSPATH =.:/usr/lib/jvm/java/jdk1.6.0 _ 31/lib

JAVA_HOME =/usr/lib/jvm/java/jdk1.6.0 _ 31

Save

Note: In some cases, the linux system will install some packages such as the openjdk package by default, which will cause coexistence of multiple JVMs. You also need to use the update-alternatives command to select the default jvm to the jdk directory just installed.

It is found that Ubuntu Linux11.10 server does not have any other jdk packages installed by default, and java commands cannot be run before, so you do not need to run the update-alternatives command.

Sudo reboot

1.2 create hadoop users and hadoop groups

After the system is restarted, log on to the abc user

Sudo addgroup hadoop

Sudo adduser -- ingroup hadoop

Enter the new password twice, and then enter irrelevant information until the command is completed. The hadoop user is created.

Enter the root password and replace it with the root user.

Continue to enter the command:

Chmod u + w/etc/sudoers

Vi/etc/sudoers

Add a row after this row: root ALL = (ALL: ALL) ALL:

Hadoop ALL = (ALL: ALL) ALL

This means that the hadoop user sudo is allowed to run any command.

Save

Chmod u-w/etc/sudoers

This is to change the sudoers file permission back to 440, that is, the root user is usually read-only. When running the sudo command in Ubuntu linux, the system checks whether the file has 440 permissions. If it is not 440, The sudo command cannot work. Therefore, you must change it back to the original 440.

The task as the root user ends. Enter exit to exit the root user.

Enter exit to exit the abc user.

1.3 configure the SSH Key so that hadoop users can log on to the cluster without a password

Log On with the user hadoop just created,

Sudo apt-get install ssh

Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Run the ssh localhost command to test whether ssh works. If you do not need to enter a password, it is correct.

1.4 check the host name, modify/etc/hostname,/etc/hosts

Sudo vi/etc/hostname

Check whether the automatically assigned host name is appropriate. If not, change it to a meaningful name, such as namenode and save

Ifconfig

Check the current IP address and record it.

Sudo vi/etc/hosts

The two rows starting with 127 do not need to be moved,

Add the recorded IP address and the new host name to save

This is important. Otherwise, the reduce step of jobtracker may be abnormal.

1.5 install the hadoop package

Choose http://hadoop.apache.org/common/releases.htmlto find a stable. Find 0.20.203.0 and find an image site. Download this package to the/home/hadoop/directory.

Continue to input commands as a hadoop user

Sudo mkdir/usr/local/hadoop

Sudo chown hadoop: hadoop/usr/local/hadoop

Cp/ home/hadoop/hadoop-0.20.203.0rc1.tar.gz/usr/local/hadoop

Cd/usr/local/hadoop

Tar zxvf hadoop-0.20.203.0rc1.tar.gz

Cd hadoop-0.20.203.0/conf

Vi hadoop-env.sh

Change this line to export JAVA_HOME =/usr/lib/jvm/java/jdk1.6.0 _ 31.

Vi core-site.xml

It is empty. Change the content:

<Configuration>
<Property>
<Name> fs. default. name </name>
<Value> hdfs :/// namenode: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hadoop/tmp </value>
</Property>
</Configuration>

Vi hdfs-site.xml, add:

<Property>
<Name> fs. default. name </name>
<Value> hdfs :/// namenode: 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/home/hadoop/tmp </value>
</Property>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>

This dfs. replication indicates the number of copies of data replication, and the production environment cannot be 1. Of course, it must be greater than 1.

Vi mapred-site.xml, change it:

Note that jobtracker and namenode use the same host, that is, on the same machine, the production environment can be split into two machines by namenode and jobtracker.

All are changed. Modify the PATH variable:

Sudo vi/etc/environment

Add:/usr/local/hadoop/hadoop-0.20.203.0/bin behind the line of PATH to save it, so that hadoop commands can be available at any time.

Restart sudo reboot

1.6 format hdfs

Log on with a hadoop user,

Hadoop namenode-format

1.7 start configuration and verification of this single machine

Start-all.sh

A single hadoop node is started.

Verification can be performed using:

Jps

The result is correct:

3156 NameNode
2743 SecondaryNameNode
Jps 3447
2807 JobTracker
2909 TaskTracker
2638 DataNode

Hadoop dfsadmin-report

Displays hdfs information.

Access http: // namenode: 50070/display hdfs Information

Http: // namenode: 50030/displays jobtracker information.

You can also use some common commands to place files on hdfs, such

Hadoop fs-put test.txt/user/hadoop/test. text

The above can prove that hdfs is basically normal. Next we will verify that jobtracker and taskTracker are normal and prepare to run the wordcount program in hadoop example.

Cd/usr/local/hadoop/hadoop-0.20.203.0

Hadoop fs-put conf input

Copy the conf directory to hdfs

Hadoop jar hadoop-examples-0.20.203.0.jar wordcount input output

This is probably the right result, that is, map increases to 100%, reduce also increases to 100%,

12/03/05 07:52:09 INFO input. FileInputFormat: Total input paths to process: 15
12/03/05 07:52:09 INFO mapred. JobClient: Running job: job_201203050735_0001
12/03/05 07:52:10 INFO mapred. JobClient: map 0% reduce 0%
12/03/05 07:52:24 INFO mapred. JobClient: map 13% reduce 0%
12/03/05 07:52:25 INFO mapred. JobClient: map 26% reduce 0%
12/03/05 07:52:30 INFO mapred. JobClient: map 40% reduce 0%
12/03/05 07:52:31 INFO mapred. JobClient: map 53% reduce 0%
12/03/05 07:52:36 INFO mapred. JobClient: map 66% reduce 13%
12/03/05 07:52:37 INFO mapred. JobClient: map 80% reduce 13%
12/03/05 07:52:39 INFO mapred. JobClient: map 80% reduce 17%
12/03/05 07:52:42 INFO mapred. JobClient: map 100% reduce 17%
12/03/05 07:52:51 INFO mapred. JobClient: map 100% reduce 100%
12/03/05 07:52:56 INFO mapred. JobClient: Job complete: job_201203050735_0001
12/03/05 07:52:56 INFO mapred. JobClient: Counters: 26
12/03/05 07:52:56 INFO mapred. JobClient: Job Counters
12/03/05 07:52:56 INFO mapred. JobClient: Launched reduce tasks = 1
12/03/05 07:52:56 INFO mapred. JobClient: SLOTS_MILLIS_MAPS = 68532
12/03/05 07:52:56 INFO mapred. JobClient: Total time spent by all CES waiting after reserving slots (MS) = 0
12/03/05 07:52:56 INFO mapred. JobClient: Total time spent by all maps waiting after reserving slots (MS) = 0
12/03/05 07:52:56 INFO mapred. JobClient: Rack-local map tasks = 7
12/03/05 07:52:56 INFO mapred. JobClient: Launched map tasks = 15
12/03/05 07:52:56 INFO mapred. JobClient: Data-local map tasks = 8
12/03/05 07:52:56 INFO mapred. JobClient: SLOTS_MILLIS_REDUCES = 25151
12/03/05 07:52:56 INFO mapred. JobClient: File Output Format Counters
12/03/05 07:52:56 INFO mapred. JobClient: Bytes Written = 14249
12/03/05 07:52:56 INFO mapred. JobClient: FileSystemCounters
12/03/05 07:52:56 INFO mapred. JobClient: FILE_BYTES_READ = 21493
12/03/05 07:52:56 INFO mapred. JobClient: HDFS_BYTES_READ = 27707
12/03/05 07:52:56 INFO mapred. JobClient: FILE_BYTES_WRITTEN = 384596
12/03/05 07:52:56 INFO mapred. JobClient: HDFS_BYTES_WRITTEN = 14249
12/03/05 07:52:56 INFO mapred. JobClient: File Input Format Counters
12/03/05 07:52:56 INFO mapred. JobClient: Bytes Read = 25869
12/03/05 07:52:56 INFO mapred. JobClient: Map-Reduce Framework
12/03/05 07:52:56 INFO mapred. JobClient: Reduce input groups = 754
12/03/05 07:52:56 INFO mapred. JobClient: Map output materialized bytes = 21577
12/03/05 07:52:56 INFO mapred. JobClient: Combine output records = 1047
12/03/05 07:52:56 INFO mapred. JobClient: Map input records = 734
12/03/05 07:52:56 INFO mapred. JobClient: Reduce shuffle bytes = 21577
12/03/05 07:52:56 INFO mapred. JobClient: Reduce output records = 754
12/03/05 07:52:56 INFO mapred. JobClient: Spilled Records = 2094
12/03/05 07:52:56 INFO mapred. JobClient: Map output bytes = 34601
12/03/05 07:52:56 INFO mapred. JobClient: Combine input records = 2526
12/03/05 07:52:56 INFO mapred. JobClient: Map output records = 2526
12/03/05 07:52:56 INFO mapred. JobClient: SPLIT_RAW_BYTES = 1838
12/03/05 07:52:56 INFO mapred. JobClient: Reduce input records = 1047

Finally, hadoop fs-get output/home/hadoop obtains the output directory locally to view the results.

1.8 single-host stop

Stop-all.sh

1.9 Problems

Too issue fetch-failures

When running the wordcount example, the reduce task cannot reach 100% and always gets stuck at 0%.

The analysis log contains the Too uninstall fetch-failures information. After checking the information online, some people say they want to write the IP address and hostname to/etc/hosts.

As shown in the following figure, I found that it still does not work. At the beginning of the day, after unremitting efforts, we finally found the crux: the original host name for ubuntu linux is 192, and this host name 192 has become the standard configuration of hdfs. Even if I change the host name to a meaningful name later, this order is no longer correct, because the old host name is still used in the xml file of each task in the logs directory, the new host name is useless, and where does the old host name exist? Later, we found that the host name exists in those files in the hdfs file system. Therefore, you need to repeat steps 1 to 2. After doing this again, you can run the wordcount sample program successfully.

I have answered questions from my colleagues in the QQ group:

Running hadoop namenode-format indicates that the main function cannot be found.

A: The CLASSPATH settings are incorrect.

There are so many installation configurations on a single machine. This article is about this. This article will discuss how to add new hadoop nodes to make multiple nodes a fully-step cluster.

If you think this article is helpful, please help us with the recommendations. Thank you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More