The installation of Hadoop cluster under Linux

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Name java dfs installation

Tags .gz block change configuration data default directory environment

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Note: Because the Hadoop remote invocation is RPC, the Linux system must shut down the firewall
Service Iptables Stop

1.vi/etc/inittab
Id:5:initdefault: Change to Id:3:initdefault: start with character type

2.ip configuration:/etc/sysconfig/network-scripts/

3.vi/etc/hosts,add hostname

4.useradd Hadoop: Adding a user
passwd Hadoop: Adding a password to a user

5. For the following documents:
-rw-r–r–1 root root 42266180 Dec 10:08 hadoop-0.19.0.tar.gz
You can modify the following command:
chmod 777 Hadoop hadoop-0.19.0.tar.gz: Modify file permissions to maximum permissions
Chown hadoop.hadoop hadoop-0.19.0.tar.gz: Change the owner and group owner of the file to Hadoop

6. Increase SSH authorization on each master and slavers (operate under Hadoop users)
Three return all the way with the ssh-keygen-t RSA command
CD. SSH
CP Id_rsa.pub Authorized_keys
Copy authorized_keys files from master to all other slaves machines via SCP such as:
SCP Authorized_keys Root@slave01:/home/hadoop/master_au_keys
At the same time, the Authorized_keys on the Daily slave machine is also added to the master machine Authorized_keys
Use SSH master or SSH slave01 without passwords, i.e. ok!

7. Install JDK
Download the JDK installation package to the Sun Web site jdk-6u11-linux-i586.bin,copy to the machine's USR directory and install it under the root user of each machine.
Under Root:
Cd/usr
chmod +x Jdk-6u11-linux-i586.bin adds execution permissions to the installation files.
./jdk-6u11-linux-i586.bin, when prompted to press several empty bar, enter Yes and start installing JDK6.
When installed, modify the directory name to Jdk6.
Note (Centos5.2 can not delete the JDK 1.4): The general Linux installed after a 1.4 jdk, must be deleted.
Rpm-qa |grep-i Java, you will see all the RMP packages that contain Java removed.
RPM-E package name.

Setting the environment variables for JDK, given that the JDK may be used by other system users, it is recommended that you set the environment variables directly in the/etc/profile:
Export JAVA_HOME=/USR/JDK6
Export classpath= $CLASSPATH: $JAVA _home/lib: $JAVA _home/jre/lib
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH: $HOME Bin
Use the Java environment with Source/etc/profile to take effect.

8.Hadoop environment variable settings and configuration file modifications
Add JDK directory to conf/hadoop-env file
Export JAVA_HOME=/USR/JDK6

Add Namenode machine name in Masters: Master
Add Datanode machine name in slavers: Slave01 ...

Add the path path to Hadoop in the/etc/profile file:
Export hadoop_home=/home/hadoop/hadoop-0.19.0
Export path= $PATH: $HADOOP _home/bin

Modify Hadoop-site.xml
Add the following:

fs.default.name//your namenode configuration, machine name plus port

hdfs://10.2.224.46:54310/

mapred.job.tracker//your jobtracker configuration, machine name plus port

hdfs://10.2.224.46:54311/

The number of dfs.replication//data needs to be backed up, by default is three

Hadoop.tmp.dir//hadoop default temporary path, this is the best configuration, and then in the new node or other circumstances inexplicably datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again.

/home/hadoop/tmp/

Dfs.name.dir

/home/hadoop/name/

Dfs.data.dir

/home/hadoop/data/

Some parameters of the Mapred.child.java.opts//java virtual machine can refer to the configuration

-xmx512m

The size of the Dfs.block.size//block, Unit bytes, will be referred to the use, must be a multiple of 512, because CRC is used for file integrity inspection, the default configuration 512 is the smallest unit of checksum.

5120000

The default block size is for new files.

———————–
Before we start, we need to format the Namenode first, enter the ~/hadoopinstall/hadoop directory, and execute the following command
$bin/hadoop Namenode-format

Now it's time to officially start Hadoop, and there are a lot of startup scripts under bin/that can be started according to your needs.
* start-all.sh start all Hadoop daemons. including Namenode, Datanode, Jobtracker, Tasktrack
* stop-all.sh Stop all Hadoop
* start-mapred.sh start map/reduce Guardian. including Jobtracker and Tasktrack.
* Stop-mapred.sh Stop Map/reduce Guard
* start-dfs.sh start Hadoop dfs daemon Namenode and Datanode
* Stop-dfs.sh Stop Dfs Guardian

————————–
Viewing and testing
Bin/hadoop dfsadmin-report View all Datanode nodes

Browse Namenode and Jobtracker in Web Form
* namenode-http://10.0.0.88:50070
* jobtracker-http://10.0.0.88:50030

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More