Environment Building-hadoop cluster building

Source: Internet
Author: User
Tags scp command hdfs dfs
Environment Building-hadoop cluster building

Before writing, we quickly set up the centos cluster environment. Next, we will start building hadoop clusters.

Lab Environment
Hadoop version: CDH 5.7.0
Here, I would like to say that we have not selected the official version because the CDH version has already solved the dependencies between various components. Later, we will use more components in the hadoop family. The CDH version is currently the most widely used version in domestic production environments.

The installation package required by the environment can be obtained in my Baidu cloud sharing:
Link: http://pan.baidu.com/s/1c24gbUK password: 8h1r

Before officially installing hadoop, We need to configure cluster SSH password-free Login

Configure the/etc/hosts file

Add:

192.168.1.61 hadoop000
192.168.1.62 hadoop001
192.168.1.63 hadoop002

[[email protected] ~]# vim /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.1.61    hadoop000192.168.1.62    hadoop001192.168.1.63    hadoop002

Every machine must be added!

In this case, our servers can ping each other's host names.

Start configuring cluster SSH password-free Login

Configure SSH password-free logon for the local machine on three machines

ssh-keygen -t rsa

Generate the local public key, and press ENTER continuously.

[[email protected] app]# ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ‘/root/.ssh‘.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:0b:9a:fb:86:9c:97:b8:6f:2a:d9:2e:6e:a2:3b:49:95 [email protected]The key‘s randomart image is:+--[ RSA 2048]----+|                 ||                 ||    .            ||   E             ||  .   . S        || .   o . .       ||.. +o+ ..        ||+ = *.=          ||+*.+=Oo          |+-----------------+[[email protected] app]#


Copy the public key to the authorized_keys file,

[[Email protected] ~] # Cd. SSH/[[email protected]. SSH] # lsid_rsa id_rsa.pub [[email protected]. SSH] # Touch authorized_keys [[email protected]. SSH] # cp id_rsa.pub authorized_keys CP: overwrite 'authorized _ keys '? Yes [[email protected]. SSH] # lsauthorized_keys id_rsa id_rsa.pub [[email protected]. SSH] # Cat authorized_keys ssh-RSA Secure + tym5p2qkffuj/ufyfw/secure + encrypt/decrypt + dmxdhoak/encrypt + encrypt/decrypt + 0 sckqdxscsi = [email protected] [email protected ]. SSH] # Here, each step is very important

In this case, you do not need to enter the password (the first exception) to connect to the local machine using SSH)

[[email protected] .ssh]# ssh hadoop000The authenticity of host ‘hadoop000 (192.168.1.61)‘ can‘t be established.RSA key fingerprint is d7:1b:23:6b:0f:80:26:cd:da:9f:89:75:f6:4d:50:4c.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added ‘hadoop000,192.168.1.61‘ (RSA) to the list of known hosts.Last login: Thu Nov 23 15:01:36 2017 from 192.168.1.9[[email protected] ~]# logoutConnection to hadoop000 closed.[[email protected] .ssh]# ssh hadoop000Last login: Thu Nov 23 15:32:58 2017 from hadoop000[[email protected] ~]# 

The other two servers perform the same operation.

Configure hadoop000 node SSH password-free login to other nodes

ssh-copy-id -i hadoop001
[[email protected] .ssh]# ssh-copy-id -i hadoop001The authenticity of host ‘hadoop001 (192.168.1.62)‘ can‘t be established.RSA key fingerprint is d3:ca:00:af:e5:40:0a:a6:9b:0d:a6:42:bc:22:48:66.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added ‘hadoop001,192.168.1.62‘ (RSA) to the list of known hosts.[email protected]‘s password: Now try logging into the machine, with "ssh ‘hadoop001‘", and check in:  .ssh/authorized_keysto make sure we haven‘t added extra keys that you weren‘t expecting.[[email protected] .ssh]# 

Test:

[[Email protected]. Ssh] # SSH hadoop001last login: Thu Nov 23 14:25:48 2017 from 192.168.1.9 [[email protected] ~] # Logoutconnection to hadoop001 closed. [[email protected]. Ssh] # password-free login successful
Well, we have mentioned so much about it. Now we will officially start building a hadoop environment.

Install hadoop

1. First, we need to download the software first. Here we just need to upload it directly from the local software package. Of course, it is also possible to download it from the official website.

2. decompress the package to the app directory and change the file name to hadoop.

[[email protected] softwares]# lshadoop-2.6.0-cdh5.7.0.tar.gz  jdk-8u144-linux-x64.tar.gz[[email protected] softwares]# tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ../app/

3. Set system environment variables

[[email protected] hadoop]# vim ~/.bash_profile # .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then        . ~/.bashrcfiexport JAVA_HOME=/root/app/jdk1.8.0_144export PATH=$JAVA_HOME/bin:$PATHexport HADOOP_HOME=/root/app/hadoopexport PATH=$HADOOP_HOME/bin:$PATHexport PATH=$HADOOP_HOME/sbin:$PATH

Inspection:

[[Email protected] hadoop] # source ~ /. Bash_profile [[email protected] hadoop] # [[email protected] hadoop] # hadoop versionhadoop 2.6.0-cdh5.7.0subversion http://github.com/cloudera/hadoop-r powered by Jenkins on 2016-03-23t18: 41 zcompiled with protoc 2.5.0from source with checksum ready command was run using/root/APP/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar [[email protected] hadoop] # You can see the above output result, it means there is no problem.

If you are not at ease, check yarn.

[[Email protected] hadoop] # yarn versionhadoop 2.6.0-cdh5.7.0subversion http://github.com/cloudera/hadoop-r c00978c67b0d3fe9f3b896b5030741bd40bf541acompiled by Jenkins on 2016-03-23t18: 41 zcompiled with protoc 2.5.0from source with checksum ready command was run using/root/APP/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.7.0.jar [[email protected] hadoop] # Yes, in place

4. Next, write the configuration file.

The configuration file is under $ hadoop_home/etc/hadoop.

The main modification configuration files include: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh, etc.

Step4.1: hadoop-env.sh

Configure the hadoop JDK Environment

[[email protected] hadoop]# vim hadoop-env.sh

Change to # The JAVA Implementation to use.
Export java_home =/root/APP/jdk1.8.0 _ 144
Save and exit

Step4.2: core-site.xml

[[email protected] hadoop]# vim core-site.xml

Add:

<property> <name>fs.defaultFS</name> <value>hdfs://hadoop000:8020</value>  </property><property>     <name>hadoop.tmp.dir</name>                        <value>/root/app/hadoop/data/tmp</value></property>

Just save and exit

Step4.3: hdfs-site.xml

Add:

<property>  <name>dfs.name.dir</name>  <value>/root/app/hadoop/data/namenode</value></property><property>  <name>dfs.data.dir</name>  <value>/root/app/hadoop/data/datanode</value></property><property>  <name>dfs.tmp.dir</name>  <value>/root/app/hadoop/data/dfstmp</value></property><property>  <name>dfs.replication</name>  <value>3</value></property>

Save and exit

Step 4.4: mapred-site.xml

[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml[[email protected] hadoop]# [[email protected] hadoop]# vim mapred-site.xml

Add:

<property>  <name>mapreduce.framework.name</name>  <value>yarn</value></property>

Save and exit

Step4.5: yarn-site.xml

Go to add:

<property>  <name>yarn.resourcemanager.hostname</name>  <value>sparkproject1</value></property><property>  <name>yarn.nodemanager.aux-services</name>  <value>mapreduce_shuffle</value></property>

Save and exit

Step4.6: slaves

Configure our subnodes:

[[email protected] hadoop]# vim slaves hadoop000hadoop001hadoop002

Note: hadoop000 is both a master node and a slave node.

Build a hadoop environment on the other two servers

Here we can directly use the SCP command to copy all the configuration of hadoop000 to the other two machines.

[[Email protected] ~] # SCP-R./* [email protected]:/root [[email protected] ~] # SCP-R./* [email protected]:/root # it will be slow. Wait a moment and [[email protected] ~] # SCP-R ~ /. Bash_profile [email protected]: ~ /. Bash_profile 100% 359 0.4kb/s [[email protected] ~] # SCP-R ~ /. Bash_profile [email protected]: ~ /. Bash_profile 100% 359 0.4kb/s [[email protected] ~] #

In hadoop001 and hadoop002, the source file is used to send the past cooperation files.

[[email protected] ~]# source ~/.bash_profile [[email protected] ~]# 

Next, we can start our cluster.

Start an HDFS Cluster

HDFS is a distributed file system of hadoop. In short, HDFS stores data. It's massive data!

1. Format namenode

[[email protected] ~]# hdfs namenode -format

2. Start the Cluster

Run the JPS command to view the following information:

We can see that a namenode and datanode are enabled on the hadoop master node.
The datanode of the other two nodes is also enabled.

At this time, we can also view the cluster situation on a Windows machine through a browser:

Enter hadoop000: 50070 in the browser: (because I have added the ing between the host name and the IP address in the hosts file of the local drive C, you can also directly use the IP address + port form)

Open the above:

The root directory of the Cluster. At this time, there is nothing on it. Let's upload something.

Upload a test1.txt File

[[email protected] ~]# cat test1.txt Hello HadoopHello BigDataHello Tomorrow[[email protected] ~]# hdfs dfs -put test1.txt /17/11/23 17:16:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[[email protected] ~]# 

Refresh the page to see the file;

To view the file content, you can put the file down. For more information about HDFS, see yarn ....

Start the yarn Cluster

In a hadoop cluster, yarn plays the role of managing and scheduling cluster resources.

Enter the following command:

[[email protected] hadoop]# start-yarn.shstarting yarn daemonsstarting resourcemanager, logging to /root/app/hadoop/logs/yarn-root-resourcemanager-localhost.localdomain.outhadoop002: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.outhadoop001: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.outhadoop000: starting nodemanager, logging to /root/app/hadoop/logs/yarn-root-nodemanager-localhost.localdomain.out[[email protected] hadoop]# 

Similarly, view the process in JPs:


As you can see, the master node has two more processes, ResourceManager and nodemanager, And the other slave nodes have nodemanger. This is the process for managing the resources of each node, it indicates that the startup is successful, and yarn also provides the Web end. The port is 8088,

Enter hadoop000: 8088 in the browser:

Pay attention to the content circled above

We can start a simple job and test it.

[[email protected] hadoop]# pwd/root/app/hadoop/share/hadoop[[email protected] hadoop]# hadoop jar mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3

After the preceding command is executed, the PI value is calculated and a mapreduce job is started.

After a while, the results will be output.

At this time, if you view the cluster, you will see a job being executed:

Output result:

The output result of PI is 4, although the error is a little large. However, a job is successfully run.

By now, we have completed the setup and testing of the hadoop cluster. Not especially troublesome. You can go back and take a closer look. It will take a while. At present, I just got started with big data. If there is something wrong with it, I can leave a message .. Happy learning !!

Environment Building-hadoop cluster building

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.