Basic software and hardware configuration:
X86 desktop, window7 64-bit system vb Virtual Machine (x86 desktop at least 4G memory, in order to open 3 virtual machines) centos6.4 operating system hadoop-1.1.2.tar.gz
Jdk-6u24-linux-i586.bin
1. configuration under root
A) modify the Host Name: vi/etc/sysconfig/network
Master, slave1, slave2
B) Resolution Ip Address: vi/etc/hosts
192.168.8.100 master
192.168.8.101 slave1
192.168.8.102 slave2
C) network debugging:
Use a bridge to connect to the network and configure the network.
After modification, remember to call service network restart
Ensure that the three VMS can ping each other.
D. Disable the firewall.
View: service iptables status
Disable: service iptables stop
Check whether the firewall is self-started:
Chkconfig -- list | grepiptables
Disable auto-start:
Chkconfig iptables off
Ii. Configuration Under the yao user
A) create a user yao, set the password, and enter the user
Useradd yao
Passwd 123456
B) create a public/private key on the master node.
Ssh-keygen-t rsa
1) Copy id_rsa.pub to authorized_keys
Cp id_rsa.pub authorized_keys
2) Copy authorized_keys from the master to/home under slave1
Scp id_rsa.pub root@192.168.8.101:/home
3) copy the authorized_keys copied from the master to the authorized_keys created by slave1. Similarly, slave2 does. At last, any authorized_keys contains the public key of all units.
4) Copy hadoop to the corresponding host/home/yao/Documents/
Configure the environment variable vi/etc/profile under root
Export HADOOP_HOME =/home/yao/Documents/hadoop
Export HADOOP_HOME_WARN_SUPPRESS = 1
Export PATH =.: $ PATH: $ HADOOP_HOME
Note: The su + User Name allows users to be switched.
5) install jdk. authorization is required during decompression;
Chmod u + x jdk...
Decompress the package.
Configure environment variables: vi/etc/profile
6) modify the configuration file in/hadoop/conf.
Modify core-site.xml
Modify hdfs-site.xml
Modify mapred-site.xml
7) modify the hadoop/conf/hadoop-evn.xml file, where the jdk path is specified.
Export JAVA_HOME =/usr/local/jdk
8) Modify/hadoop/conf/masters and slaves to negotiate the Virtual Machine name to let hadoop know the host and datanode;
Masters: Master
Slavers: Slave1 Slave2
3. Copy hadoop
The hadoop configuration in the above master is basically completed. Because the hadoop configuration on the namenode node is the same, now we copy the hadoop on the master to slave1 and slave2 respectively.
Command:
Scp-r./hadoop yao @ slave1:/home/yao/
Scp-r./hadoop yao @ slave2:/home/yao/
After the copy is completed, run the following command in the hadoop directory on the master machine:
Format: Bin/hadoop namenode-format
Next, run start:
Bin/start-all.sh
In slave1, enter jps:
Similarly, in slave2, the same result can be obtained by inputting jps:
Summary:
To configure a fully distributed hadoop cluster, perform the following steps:
1) configure the Hosts file
2) create a Hadoop Running Account
3) Configure ssh password-free connection
4) download and decompress the hadoop installation package
5) Configure namenode and modify the site file
6) Configure hadoop-env.sh
7) configure the masters and slaves files.
8) Copy hadoop to nodes
9) format namenode
10) Start hadoop
11) Use jps to check whether various background processes are successfully started
Note: It is not easy to stand out. From the installation stage, each step will encounter various problems that need to be solved. This is a process familiar with commands and hadoop file mechanisms.
Pseudo-distributed
The construction of pseudo-distributed architecture is very simple, because it is a single node, the above steps only need:
1) create a Hadoop Running Account
2) Configure ssh password-free connection (for a single node, you only need to copy id_rsa.pub to authorized_keys to implement password-free connection)
3) download and decompress the hadoop installation package
4) download the jdk and decompress it for installation.
5) modify the site file
6) Configure hadoop-env.sh
7) format namenode
8) Start hadoop
9) Use jps to check whether various background processes are successfully started
OK, basically understand the hadoop build process. The pseudo distribution and full distribution are very simple.