Environment Building-hadoop cluster building
Before writing, we quickly set up the centos cluster environment. Next, we will start building hadoop clusters.
Lab Environment
Hadoop version: CDH 5.7.0
Here, I would like to say that we have not selected the official version because the CDH version has already solved the dependencies between various components. Later, we will use more components in the hadoop family. The CDH version is currently the most widely used version in domestic production environments.
The installation package required by the environment can be obtained in my Baidu cloud sharing:
Link: http://pan.baidu.com/s/1c24gbUK password: 8h1r
Before officially installing hadoop, We need to configure cluster SSH password-free Login
Configure the/etc/hosts file
Add:
192.168.1.61 hadoop000
192.168.1.62 hadoop001
192.168.1.63 hadoop002
[[email protected] ~]# vim /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.1.61 hadoop000192.168.1.62 hadoop001192.168.1.63 hadoop002
Every machine must be added!
In this case, our servers can ping each other's host names.
Start configuring cluster SSH password-free Login
Configure SSH password-free logon for the local machine on three machines
ssh-keygen -t rsa
Generate the local public key, and press ENTER continuously.
[[email protected] app]# ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ‘/root/.ssh‘.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:0b:9a:fb:86:9c:97:b8:6f:2a:d9:2e:6e:a2:3b:49:95 [email protected]The key‘s randomart image is:+--[ RSA 2048]----+| || || . || E || . . S || . o . . ||.. +o+ .. ||+ = *.= ||+*.+=Oo |+-----------------+[[email protected] app]#
Copy the public key to the authorized_keys file,
[[Email protected] ~] # Cd. SSH/[[email protected]. SSH] # lsid_rsa id_rsa.pub [[email protected]. SSH] # Touch authorized_keys [[email protected]. SSH] # cp id_rsa.pub authorized_keys CP: overwrite 'authorized _ keys '? Yes [[email protected]. SSH] # lsauthorized_keys id_rsa id_rsa.pub [[email protected]. SSH] # Cat authorized_keys ssh-RSA Secure + tym5p2qkffuj/ufyfw/secure + encrypt/decrypt + dmxdhoak/encrypt + encrypt/decrypt + 0 sckqdxscsi = [email protected] [email protected ]. SSH] # Here, each step is very important
In this case, you do not need to enter the password (the first exception) to connect to the local machine using SSH)
[[email protected] .ssh]# ssh hadoop000The authenticity of host ‘hadoop000 (192.168.1.61)‘ can‘t be established.RSA key fingerprint is d7:1b:23:6b:0f:80:26:cd:da:9f:89:75:f6:4d:50:4c.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added ‘hadoop000,192.168.1.61‘ (RSA) to the list of known hosts.Last login: Thu Nov 23 15:01:36 2017 from 192.168.1.9[[email protected] ~]# logoutConnection to hadoop000 closed.[[email protected] .ssh]# ssh hadoop000Last login: Thu Nov 23 15:32:58 2017 from hadoop000[[email protected] ~]#
The other two servers perform the same operation.
Configure hadoop000 node SSH password-free login to other nodes
ssh-copy-id -i hadoop001
[[email protected] .ssh]# ssh-copy-id -i hadoop001The authenticity of host ‘hadoop001 (192.168.1.62)‘ can‘t be established.RSA key fingerprint is d3:ca:00:af:e5:40:0a:a6:9b:0d:a6:42:bc:22:48:66.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added ‘hadoop001,192.168.1.62‘ (RSA) to the list of known hosts.[email protected]‘s password: Now try logging into the machine, with "ssh ‘hadoop001‘", and check in: .ssh/authorized_keysto make sure we haven‘t added extra keys that you weren‘t expecting.[[email protected] .ssh]#
Test:
[[Email protected]. Ssh] # SSH hadoop001last login: Thu Nov 23 14:25:48 2017 from 192.168.1.9 [[email protected] ~] # Logoutconnection to hadoop001 closed. [[email protected]. Ssh] # password-free login successful
Well, we have mentioned so much about it. Now we will officially start building a hadoop environment.
Install hadoop
1. First, we need to download the software first. Here we just need to upload it directly from the local software package. Of course, it is also possible to download it from the official website.
2. decompress the package to the app directory and change the file name to hadoop.
[[email protected] softwares]# lshadoop-2.6.0-cdh5.7.0.tar.gz jdk-8u144-linux-x64.tar.gz[[email protected] softwares]# tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ../app/
3. Set system environment variables
[[email protected] hadoop]# vim ~/.bash_profile # .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfiexport JAVA_HOME=/root/app/jdk1.8.0_144export PATH=$JAVA_HOME/bin:$PATHexport HADOOP_HOME=/root/app/hadoopexport PATH=$HADOOP_HOME/bin:$PATHexport PATH=$HADOOP_HOME/sbin:$PATH
Inspection:
[[Email protected] hadoop] # source ~ /. Bash_profile [[email protected] hadoop] # [[email protected] hadoop] # hadoop versionhadoop 2.6.0-cdh5.7.0subversion http://github.com/cloudera/hadoop-r powered by Jenkins on 2016-03-23t18: 41 zcompiled with protoc 2.5.0from source with checksum ready command was run using/root/APP/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar [[email protected] hadoop] # You can see the above output result, it means there is no problem.
If you are not at ease, check yarn.
[[Email protected] hadoop] # yarn versionhadoop 2.6.0-cdh5.7.0subversion http://github.com/cloudera/hadoop-r c00978c67b0d3fe9f3b896b5030741bd40bf541acompiled by Jenkins on 2016-03-23t18: 41 zcompiled with protoc 2.5.0from source with checksum ready command was run using/root/APP/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar [[email protected] hadoop] # Yes, in place
4. Next, write the configuration file.
The configuration file is under $ hadoop_home/etc/hadoop.
The main modification configuration files include: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh, etc.
Step4.1: hadoop-env.sh
Configure the hadoop JDK Environment
[[email protected] hadoop]# vim hadoop-env.sh
Change to # The JAVA Implementation to use.
Export java_home =/root/APP/jdk1.8.0 _ 144
Save and exit
Step4.2: core-site.xml
[[email protected] hadoop]# vim core-site.xml
Add:
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop000:8020</value> </property><property> <name>hadoop.tmp.dir</name> <value>/root/app/hadoop/data/tmp</value></property>
Just save and exit
Step4.3: hdfs-site.xml
Add:
<property> <name>dfs.name.dir</name> <value>/root/app/hadoop/data/namenode</value></property><property> <name>dfs.data.dir</name> <value>/root/app/hadoop/data/datanode</value></property><property> <name>dfs.tmp.dir</name> <value>/root/app/hadoop/data/dfstmp</value></property><property> <name>dfs.replication</name> <value>3</value></property>
Save and exit
Step 4.4: mapred-site.xml
[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml[[email protected] hadoop]# [[email protected] hadoop]# vim mapred-site.xml
Add:
<property> <name>mapreduce.framework.name</name> <value>yarn</value></property>
Save and exit
Step4.5: yarn-site.xml
Go to add:
<property> <name>yarn.resourcemanager.hostname</name> <value>sparkproject1</value></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property>
Save and exit
Step4.6: slaves
Configure our subnodes:
[[email protected] hadoop]# vim slaves hadoop000hadoop001hadoop002
Note: hadoop000 is both a master node and a slave node.
Build a hadoop environment on the other two servers
Here we can directly use the SCP command to copy all the configuration of hadoop000 to the other two machines.
[[Email protected] ~] # SCP-R./* [email protected]:/root [[email protected] ~] # SCP-R./* [email protected]:/root # it will be slow. Wait a moment and [[email protected] ~] # SCP-R ~ /. Bash_profile [email protected]: ~ /. Bash_profile 100% 359 0.4kb/s [[email protected] ~] # SCP-R ~ /. Bash_profile [email protected]: ~ /. Bash_profile 100% 359 0.4kb/s [[email protected] ~] #
In hadoop001 and hadoop002, the source file is used to send the past cooperation files.
[[email protected] ~]# source ~/.bash_profile [[email protected] ~]#
Next, we can start our cluster.
Start an HDFS Cluster
HDFS is a distributed file system of hadoop. In short, HDFS stores data. It's massive data!
1. Format namenode
[[email protected] ~]# hdfs namenode -format
2. Start the Cluster
Run the JPS command to view the following information:
We can see that a namenode and datanode are enabled on the hadoop master node.
The datanode of the other two nodes is also enabled.
At this time, we can also view the cluster situation on a Windows machine through a browser:
Enter hadoop000: 50070 in the browser: (because I have added the ing between the host name and the IP address in the hosts file of the local drive C, you can also directly use the IP address + port form)
Open the above:
The root directory of the Cluster. At this time, there is nothing on it. Let's upload something.
Upload a test1.txt File
[[email protected] ~]# cat test1.txt Hello HadoopHello BigDataHello Tomorrow[[email protected] ~]# hdfs dfs -put test1.txt /17/11/23 17:16:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[[email protected] ~]#
Refresh the page to see the file;
To view the file content, you can put the file down. For more information about HDFS, see yarn ....
Start the yarn Cluster
In a hadoop cluster, yarn plays the role of managing and scheduling cluster resources.
Enter the following command:
[[email protected] hadoop]# start-yarn.shstarting yarn daemonsstarting resourcemanager, logging to /root/app/hadoop/logs/yarn-root-resourcemanager-localhost.localdomain.outhadoop002: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.outhadoop001: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.outhadoop000: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.out[[email protected] hadoop]#
Similarly, view the process in JPs:
As you can see, the master node has two more processes, ResourceManager and nodemanager, And the other slave nodes have nodemanger. This is the process for managing the resources of each node, it indicates that the startup is successful, and yarn also provides the Web end. The port is 8088,
Enter hadoop000: 8088 in the browser:
Pay attention to the content circled above
We can start a simple job and test it.
[[email protected] hadoop]# pwd/root/app/hadoop/share/hadoop[[email protected] hadoop]# hadoop jar mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
After the preceding command is executed, the PI value is calculated and a mapreduce job is started.
After a while, the results will be output.
At this time, if you view the cluster, you will see a job being executed:
Output result:
The output result of PI is 4, although the error is a little large. However, a job is successfully run.
By now, we have completed the setup and testing of the hadoop cluster. Not especially troublesome. You can go back and take a closer look. It will take a while. At present, I just got started with big data. If there is something wrong with it, I can leave a message .. Happy learning !!
Environment Building-hadoop cluster building