1) machine preparation:
There are a total of four physical machines. to configure a Hadoop cluster based on the physical machine, there are four nodes: one Master and three Salve nodes,
Nodes are connected to each other through a LAN. The IP addresses can be pinged to the following IP addresses:
192.168.216.131 hadoop1
192.168.216.132 hadoop2
192.168.216.20.hadoop3
192.168.216.134 hadoop4
The operating system is CentOS6.2 64bit.
The Master machine is mainly configured with NameNode and JobTracker roles, responsible for managing the execution of distributed data and decomposition tasks;
The three Salve machines configure the DataNode and TaskTracker roles to be responsible for Distributed Data Storage and task execution.
In fact, there should also be a Master machine used as a backup to prevent the Master server from being down, and another backup should be enabled immediately.
After a certain period of experience, you can add a backup Master machine.
2) create an account
After logging on to all machines with root, all machines create hadoop users.
Useradd hadoop
Passwd hadoop
In this case, a hadoop directory is generated under/home/. The directory path is/home/hadoop.
Create related directories
Define the storage path for data and directories
Define the path for storing code and tools
Mkdir-p/home/hadoop/source
Mkdir-p/home/hadoop/tools
Define the path where data nodes are stored to the hadoop folder under the directory. There is enough space to store the directory where data nodes are stored.
Mkdir-p/hadoop/hdfs
Mkdir-p/hadoop/tmp
Mkdir-p/hadoop/log
Set write permission
Chmod-R 777/hadoop
Define java installer path
Mkdir-p/usr/java
3) install JDK
Download jdk1.7 (x64) from the official website)
Unzip # tar zvxf jdk-7u10-linux-x64.tar.gz
# Mv jdk1.7.0 _ 10/usr/java/jdk1.7
Configure environment variables, execute the cd/etc command, execute vi profile, and add
Export JAVA_HOME =/usr/java/jdk1.7export
CLASSPATH =.: $ JAVA_HOME/lib/tools. jar: $ JAVA_HOME/lib/dt. jar
Export PATH = $ PATH: $ JAVA_HOME/bin:
Execute chmod + x profile to convert it into an executable file
Execute source profile to make the configuration take effect immediately
# Source/etc/profile
Run java-version to check whether the installation is successful.
In this step, all machines must be installed
4) modify the Host Name
Modify the host name. All nodes are configured the same.
1. Connect to the master node 192.168.216.131, modify the network, execute vim/etc/sysconfig/network, and modify HOSTNAME = hadoop1
2. Modify the hosts file, execute the cd/etc command, execute vi hosts, and add at the end of the row:
192.168.216.131 hadoop1
192.168.216.132 hadoop2
192.168.216.20.hadoop3
192.168.216.134 hadoop4
3. Run hostname hadoop1.
4. Execute exit and reconnect. You can see the host name to modify OK.
Other nodes also modify the Host Name and add the host. Alternatively, the Host file can be overwritten by scp.
5) Configure SSH login without a password
SSH password-less principle:
First, a key pair is generated on hadoop1, including a public key and a private key, and the Public Key is copied to all slave (hadoop2-hadoop4.
Then, when the master connects to the slave through SSH, the slave will generate a random number and encrypt the random number with the master's public key and send it to the master.
After the master receives the number of encrypted data, it decrypts it with the private key and returns the number of decrypted data to slave. After the slave confirms that the number of decrypted data is correct, the master is allowed not to enter the password for connection.
2. Specific steps (executed when the root user and hadoop user log on)
1. Run the ssh-keygen-t rsa command and press enter to check the generated password-less key pair: cd. ssh and then run ll
2. append id_rsa.pub to the authorization key. Run the command cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys
3. Modify permissions: Execute chmod 600 ~ /. Ssh/authorized_keys
4. Make sure the following content exists in cat/etc/ssh/sshd_config:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile. ssh/authorized_keys
If you need to modify it, run the restart SSH service command after modification to make it take effect: service sshd restart
Copy the public key to all slave machines: scp ~ /. Ssh/id_rsa.pub192.168.216.133 :~ /Then enter yes and the password of the slave machine;
Create a. ssh folder on the slave machine: mkdir ~ /. Ssh then run chmod 700 ~ /. Ssh (you do not need to create a folder if it already exists );
Append to the authorization file authorized_keys. Execute the command: cat ~ /Id_rsa.pub> ~ /. Ssh/authorized_keys and then execute chmod 600 ~ /. Ssh/authorized_keys
Repeat Step 1
Verification command: Execute ssh 192.168.216.133 on the master machine and find that the host name is changed from hadoop1 to hadoop3,
Finally, delete the id_rsa.pub file: rm-r id_rsa.pub (this step is not required. do not conflict with the local public key file)
Configure hadoop1, hadoop2, hadoop3, and hadoop4 respectively according to the preceding steps. Each user must be logged on without a password.
6) download source code
HADOOP version
The latest hadoop-2.0.0-alpha installation package is hadoop-2.0.0-alpha.tar.gz
Download official website address: http://www.apache.org/dyn/closer.cgi/hadoop/common/
Decompress the directory in the/usr/local directory
Tar zxvf hadoop-2.0.0-alpha.tar.gz
Music hadoop-2.0.0-alpha hadoop
Source code Configuration Modification
/Etc/profile
Configure the environment variable vim/etc/profile
Add
Export HADOOP_DEV_HOME =/usr/local/hadoop # hadoop installation directory
Export PATH = $ PATH: $ HADOOP_DEV_HOME/bin
Export PATH = $ PATH: $ HADOOP_DEV_HOME/sbin
Export HADOOP_MAPARED_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_COMMON_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_HDFS_HOME =$ {HADOOP_DEV_HOME}
Export YARN_HOME =$ {HADOOP_DEV_HOME}
Export HADOOP_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop
Export HDFS_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop
Export YARN_CONF_DIR =$ {HADOOP_DEV_HOME}/etc/hadoop
7) configuration file
Configure hadoop-env.sh
Vim/usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop/hadoop-env.sh.
Add exportJAVA_HOME =/usr/java/jdk1.7 at the end
Core-site.xml
Add attributes to the configuration Node
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/hadoop/tmp </value>
<Description> A base for other temporary directories. </description>
</Property>
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // 192.168.216.131: 9000 </value>
</Property>
Slave Configuration
Vim/home/hadoop/etc/hadoop/slaves
Add an slave IP Address
192.168.216.131
192.168.216.132
192.168.216.20.
Configure hdfs-site.xml
Vim/home/hadoop/etc/hadoop/hdfs-site.xml.
Add Node
<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value> file:/hadoop/hdfs/name </value>
<Final> true </final>
</Property>
<Property>
<Name> dfs. federation. nameservice. id </name>
<Value> ns1 </value>
</Property>
<Property>
<Name> dfs. namenode. backup. address. ns1 </name>
<Value> 192.168.216.131: 50100 </value>
</Property>
<Property>
<Name> dfs. namenode. backup. http-address.ns1 </name>
<Value> 192.168.216.131: 50105 </value>
</Property>
<Property>
<Name> dfs. federation. nameservices </name>
<Value> ns1 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns1 </name>
<Value> 192.168.216.131: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns2 </name>
<Value> 192.168.216.131: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns1 </name>
<Value> 192.168.216.131: 23001 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns2 </name>
<Value> 192.168.216.131: 13001 </value>
</Property>
<Property>
<Name> dfs. dataname. data. dir </name>
<Value> file:/hadoop/hdfs/data </value>
<Final> true </final>
</Property>
<Property>
<Name> dfs. namenode. secondary. http-address.ns1 </name>
<Value> 192.168.216.131: 23002 </value>
</Property>
<Property>
<Name> dfs. namenode. secondary. http-address.ns2 </name>
<Value> 192.168.216.131: 23002 </value>
</Property>
<Property>
<Name> dfs. namenode. secondary. http-address.ns1 </name>
<Value> 192.168.216.131: 23003 </value>
</Property>
<Property>
<Name> dfs. namenode. secondary. http-address.ns2 </name>
<Value> 192.168.216.131: 23003 </value>
</Property>
Configure yarn-site.xml
Add Node
<Property>
<Name> yarn. resourcemanager. address </name>
<Value> 192.168.216.131: 18040 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. schedager. address </name>
<Value> 192.168.216.131: 18030 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. webapp. address </name>
<Value> 192.168.216.131: 18088 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. resource-tracker.address </name>
<Value> 192.168.216.131: 18025 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. admin. address </name>
<Value> 192.168.216.131: 18141 </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce. shuffle </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services.mapreduce.shuffle.class </name>
<Value> org. apache. hadoop. mapred. ShuffleHandler </value>
</Property>
Create a file mapred-site.xml with the following content in/etc/hadoop
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
<Property>
<Name> mapred. system. dir </name>
<Value> file:/hadoop/mapred/system </value>
<Final> true </final>
</Property>
<Property>
<Name> mapred. local. dir </name>
<Value> file:/hadoop/mapred/local </value>
<Final> true </final>
</Property>
Configure httpfs-site.xml
Add httpfs options
<Property>
<Name> hadoop. proxyuser. root. hosts </name>
<Value> 192.168.216.131 </value>
</Property>
<Property>
<Name> hadoop. proxyuser. root. groups </name>
<Value> * </value>
</Property>
Synchronize code to other machines
1. Synchronization configuration code
First, create a server on the server Load balancer instance.
Mkdir-p/usr/local
Deploy the hadoop code and synchronize the modified configuration files under etc/hadoop, for example, hadoop1 # scp-r hadoop root @ hadoop2:/usr/local/
2. Synchronize/etc/profile
3. Synchronize/etc/hosts
# Scp-r/etc/profile root @ hadoop2:/etc/profile
# Scp-r/etc/hosts root @ hadoop2:/etc/hosts
Other machines perform this operation
8) Hadoop startup
Format a cluster
Hadoop namenode-format-clusterid clustername
Hsfs namenode-format
Start hdfs execution:
Start-dfs.sh
Enable the hadoop dfs service;
Start Yarn
Enable yarn Resource Management Service:
Start-yarn.sh;
Start httpfs
Enable the httpfs Service
Httpfs. sh start
To improve the http restful interface service
9) test
Verification of installation results
Verify hdfs
Run jps on each machine to check whether the process has been started.
[Root @ hadoop1 hadoop] # jps
7396 NameNode
24834 Bootstrap
7594 SecondaryNameNode
7681 ResourceManager
Jps 32261
[Root @ hadoop2 ~] # Jps
Jps 8966
31822 DataNode
31935 NodeManager
Process started normally
Verify if you can log on
Hadoop fs-ls hdfs: // 192.168.216.131: 9000/
Hadoop fs-mkdir hdfs:/192.168.216.131: 9000/testfolder
Hadoop fs-copyFromLocal./xxxx hdfs: // 192.168.216.131: 9000/testfolder
Hadoop fs-ls hdfs: // 192.168.216.131: 9000/testfolder
Check whether the preceding execution is normal.
Verify map/reduce
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)