Hadoop stand-alone and fully distributed (cluster) installation

Hadoop stand-alone and fully distributed (cluster) installation _linux shell

Last Update:2017-01-18 Source: Internet

Author: User

Tags chmod ssh ssh server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop, distributed large data storage and computing, free open source! Linux based on the students to install a relatively smooth, write a few configuration files can be started, I rookie, so write a more detailed. For convenience, I use three virtual machine system is Ubuntu-12. Setting up a virtual machine's network connection uses bridging, which facilitates debugging on a local area network. Single machine and cluster installation difference is not much, first of all, then add a cluster of several configuration.

The first step is to install the tool software first
Editor: Vim

Copy Code code as follows:

sudo apt-get install vim

SSH server: OpenSSH, SSH is installed to use remote terminal tools (putty or Xshell, etc.), so it is much easier to manage virtual machines.

Copy Code code as follows:

sudo apt-get install Openssh-server

The second step, some basic settings
It is best to set the virtual machine fixed IP

Copy Code code as follows:

sudo vim/etc/network/interfaces
Add the following content:
Iface eth0 inet Static
Address 192.168.0.211
Gateway 192.168.0.222
Netmask 255.255.255.0

Modify the name of the machine, the name I specify here is: Hadoopmaster, later use it to do Namenode

Copy Code code as follows:

sudo vim/etc/hostname

Modify hosts to facilitate IP change and memory and recognition

Copy Code code as follows:

sudo vim/etc/hosts
Add Content:
192.168.0.211 Hadoopmaster

The third step is to add a user who is specifically used for Hadoop

Copy Code code as follows:

sudo addgroup Hadoop
sudo adduser-ingroup Hadoop Hadoop

Set sudo permissions for Hadoop users

Copy Code code as follows:

sudo vim/etc/sudoers

In Root all= (All:all)
Add a line of Hadoop all= (All:all) below
Switch to Hadoop user su Hadoop

The fourth step, the decompression installs the Jdk,hadoop,pig (by the way PIG also installs)

Copy Code code as follows:

sudo tar zxvf./jdk-7-linux-i586.tar.gz-c/usr/local/jvm/
sudo tar zxvf./hadoop-1.0.4.tar.gz-c/usr/local/hadoop
sudo tar zxvf./pig-0.11.1.tar.gz-c/usr/local/pig

Modify the extracted directory name and the final path is:

Copy Code code as follows:

JVM:/usr/local/jvm/jdk7
Hadoop:/usr/local/hadoop/hadoop (Note: The installation path for all nodes in Hadoop must be the same)
Pig:/usr/local/pig

Set the user who owns the directory

Copy Code code as follows:

sudo chown-r hadoop:hadoop jdk7
sudo chown-r hadoop:hadoop Hadoop
sudo chown-r hadoop:hadoop Pig

Set environment variable, edit ~/.BASHRC or ~/.profile file join

Copy Code code as follows:

Export JAVA_HOME=/USR/LOCAL/JVM/JDK7
Export JRE_HOME=${JAVA_HOME}/JRE
Export Classpath=.:${java_home}/lib:${jre_home}/lib
Export Path=${java_home}/bin: $PATH
Export Hadoop_install=/usr/local/hadoop/hadoop
Export Path=${hadoop_install}/bin: $PATH
SOURCE ~/.profile Effective

Fifth step,. SSH does not have a password to log on this machine, that is, SSH to this machine does not require a password

Copy Code code as follows:

Ssh-keygen-t Rsa-p '-F ~/.ssh/id_rsa
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

If this does not work, modify the permissions:

Copy Code code as follows:

chmod ~/.ssh
chmod ~/.ssh/authorized_keys

Authorized_keys equivalent to the white list, id_rsa.pub is the public key, usually in Authorized_keys have the public key of the requestor machine, SSH server direct release, no password!

Sixth step, Hadoop necessary setup
All settings files in the hadoop/conf directory
1, hadoop-env.sh find #export java_home Remove comments #, and set the actual JDK path
2, Core-site.xml

Copy Code code as follows:

<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>

3, Mapred-site.xml

Copy Code code as follows:

<property>
<name>mapred.job.tracker</name>
<value>hadoopmaster:9001</value>
</property>

4, Hdfs-site.xml

Copy Code code as follows:

<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

5, File Masters and file slaves, single write localhost can

Step seventh, start Hadoop
HDFs file system formatted with Hadoop

Copy Code code as follows:

Hadoop Namenode-format

Execute the Hadoop startup script, and if it is clustered and executed on master, other slave node Hadoop will execute via SSH:

Copy Code code as follows:

start-all.sh

Execute Command JPS If the display has: namenode,secondarynamenode,tasktracker,datanode,jobtracker, etc. five processes indicate that the launch was successful!

The eighth step, the configuration of the cluster
The installation of all other stand-alone machines is the same as above, only the additional cluster configuration is added below!
It is best to configure a single machine first, others can be copied directly through the SCP, the path is the best, including java!
Host list for this example (set up hosts):

Set up SSH so that master can log on to other slaves without a password, primarily to start the slaves

Copy Code code as follows:

Copy hadoopmaster down id_rsa.pub to child node:
SCP./ssh/id_rsa.pub Hadoopnode1:/home/hadoop/.ssh/id_master
SCP./ssh/id_rsa.pub Hadoopnode2:/home/hadoop/.ssh/id_master

Executed at the sub-node ~/.ssh/directory, respectively:
Cat./id_master >> Authorized_keys

Masters file, add the host name as Secondarynamenode or Namenode, one line.
Cluster write master name such as: Hadoopmaster
Slaves file, add the host name as slave, one line.
Cluster write sub-node name: such as Hadoopnode1, Hadoopnode2

Hadoop Management
Hadoop starts with a task management service and a file system Management service that is two jetty based Web services, so you can view the operation online via the web.
The task management service runs on port 50030, such as the http://127.0.0.1:50030 file System Management Service running on port 50070.

Parameter Description:
1, Dfs.name.dir: is the Namenode local file system path for persistent storage of namespaces and transaction logs. When this value is a comma-separated list of directories, the NameTable data is replicated to all directories for a redundant backup.
2, Dfs.data.dir: is the Datanode storage block data local file system path, comma-separated list. When this value is a comma-separated list of directories, the data is stored in all directories and is typically distributed across different devices.
3, Dfs.replication: is the number of data needs to be backed up, the default is 3, if this number is larger than the number of machines in the cluster will be wrong.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More