CENTOS7 Hadoop Environment under construction

Source: Internet
Author: User
Tags mkdir shuffle
CENTOS7 Hadoop Environment under construction Experimental Purpose:

Build a Hadoop platform for 5 hosts and prepare for HBase later. Experimental steps: 0x01 Hardware conditions:

5 CENTOS7 Host, IP address: x.x.x.46~50. The names of the machines are lk,node1,node2,node3,node4 respectively.
Experimental conditions by default using the root account, there is a need to cut back to the normal user situation I will mention. 0x02 Material Preparation Hadoop installation package
Official website Download Address: http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz

Java environment
Because Centos7 comes with a Java environment, you only need to configure Java_home. Enter the command first [Root@lk ~]# which Java
Return:/usr/bin/java after input [root@lk ~]# Ls-lrt/usr/bin/java
return: lrwxrwxrwx. 1 root root 22 April 2015/usr/bin/java-/etc/alternatives/java after input:[root@lk ~]# ls-lrt/etc/alternatives/java< /c7>
Returns LRWXRWXRWX. 1 root root 74 April 2015/etc/alternatives/java-/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79-2.5.5.1.el7_1.x86_64/ Jre/bin/java after editing:[root@lk ~]# vim/etc/profile
Add the following content:

    Export path= $PATH: $HADOOP _home/bin
    export path= $PATH: $HADOOP _home/bin
    Export java_home=/usr/lib/jvm/ java-1.7.0-openjdk-1.7.0.79-2.5.5.1.el7_1.x86_64
    export jre_home= $JAVA _home/jre
    Export Classpath= $JAVA _ Home/lib: $JRE _home/lib: $CLASSPATH
    export path= $JAVA _home/bin: $JRE _home/bin: $PATH: $HADOOP _home/bin
To make it effective: [Root@lk ~]# source/etc/profileYou can verify it later: [Root@lk ~]# echo $JAVA _home
return:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79-2.5.5.1.el7_1.x86_64
The Representative has successfully entered into

The above configuration is ineffective for this experiment ... The reason is that the OPENJDK structure is not the same as the common JDK package structure, causing some hadoop to find no path in some places
The correct configuration is as follows: find a download on the JDK's official website: http://www.oracle.com/technetwork/java/javase/downloads/index.html Extract to/usr/ jdk-9.0.1. Configuring environment variables:vim/etc/profile
Add

Export JAVA_HOME=/USR/JDK-9.0.1/
export classpath= $JAVA _home/lib: $CLASSPATH
export path= $JAVA _home/bin:$ PATH: $HADOOP _home/bin
To make it effective: Source/etc/profile 0x03 Environment ConstructionUnzip the downloaded Hadoop package in the/usr/hadoop-2.7.3 directory.
After extracting the directory, create the following folder for/usr/hadoop-2.7.3/hadoop/
mkdir  /usr/hadoop-2.7.3/tmp  
mkdir  /usr/hadoop-2.7.3/var  
mkdir  /usr/hadoop-2.7.3/dfs  
mkdir  /usr/hadoop-2.7.3/dfs/name  
mkdir  /usr/hadoop-2.7.3/dfs/data  
Set Environment variables: Vim/etc/profileWrite:
Export Hadoop_home=/usr/hadoop-2.7.3/hadoop
export path= $PATH: $HADOOP _home/bin(You may need to merge with the path of Java) to make it effective: Source/etc/profile Hadoop ConfigurationEnter the $hadoop_home/etc/hadoop directory, configure hadoop-env.sh, and so on. The configuration files involved are as follows:
hadoop/etc/hadoop/hadoop-env.sh 
hadoop/etc/hadoop/yarn-env.sh 
hadoop/etc/hadoop/core-site.xml 
Hadoop/etc/hadoop/hdfs-site.xml 
hadoop/etc/hadoop/mapred-site.xml 
hadoop/etc/hadoop/yarn-site.xml
Hadoop/etc/hadoop/slaves

The specific changes are as follows:
Configure hadoop-env.sh

# The Java implementation to use.
#export Java_home=/home/graph/desktop/java
Export Java_home=${java_home}
Configure yarn-env.sh
Need to modify the path of the java_home can not be directly used $java_home need to hit the specific path.
# some Java parameters
# export java_home=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/USR/JDK-9.0.1/
Configure Core-site.xml
Add
  <configuration>
        <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/ Hadoop-2.7.3/tmp</value>
        <description>abase for other temporary directories.</description>
   </property>
   <property>
        <name>fs.default.name</name>
        <value>hdfs ://lk:9000</value>
   </property>
</configuration>

Note Modify the path and address. Configure Hdfs-site.xml
Add

<property>
   <name>dfs.name.dir</name>
   <value>/usr/hadoop-2.7.3/dfs/name</  Value>
   <description>path on the local filesystem where Thenamenode stores the namespace and transactions logs persistently.</description>
</property>
<property>
   <name>dfs.data.dir</ name>
   <value>/usr/hadoop-2.7.3/dfs/data</value>
   <description>comma separated list of Paths on the LocalFileSystem of a DataNode where it should store its blocks.</description>
</property>
  
   <property>
   <name>dfs.replication</name>
   <value>2</value>
</ property>
<property>
      <name>dfs.permissions</name>
      <value>false</ Value>
      <description>need not permissions</description>
</property>
</ Configuration>
  
Configure Mapred-site.xml
Add
<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>lk :49001</value>
</property>
<property>
      <name>mapred.local.dir</name>
       <value>/usr/hadoop-2.7.3/var</value>
</property>
<property>
       <name >mapreduce.framework.name</name>
       <value>yarn</value>
</property>
</configuration>
Configure Yarn-site.xml
Add
<configuration> <!--Site specific YARN Configuration Properties---<property> <name>ya rn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <prope Rty> <name>yarn.resourcemanager.webapp.address</name> <value>10.113.10.46:8099</va lue> </property> <property> <name>yarn.resourcemanager.hostname</name> <valu  e>lk</value> </property> <property> <description>the https adddress of the RM web Application.</description> <name>yarn.resourcemanager.webapp.https.address</name> <v alue>${yarn.resourcemanager.hostname}:8090</value> </property> <property> <name>y Arn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031 </value> </property> <property> <description>the Address of the RM admin interface.</description> &lt ;name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033<
        /value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.scheduler. maximum-allocation-mb</name> <value>1537</value> <discription> free memory per node, MB, default 8182 mb</discription> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</n ame> <value>2.1</value> </property> <property> <name>yarn.nodemanag

 er.vmem-check-enabled</name> <value>false</value> </property> </configuration>

Configuration slaves file: Vim slaves

Node1
node2
node3
node4

Configure all other node machines: Since LK is Namenode, the remaining 4 units are datanode, so in addition to the final step configuration slaves, the rest is exactly the same, you can use Scp-r/usr/hadoop-2.7.3/xxx@x.x.x.x:~/completely copied over. Note that the machine also needs to configure the Java environment.

Finally adjust the permissions, we are under the root permission to modify the operation, after the need to return to the normal user rights, but the normal user default is not to modify the USR folder contents, so the hadoop-2.7.3 folder overall right.

The specific permissions are as follows: sudo chown xxx:xxx-r hadoop-2.7.3/is operated on every machine. 0x04 Hadoop Startup Test

The x.x.x.47~50 machine is datanode and does not require any other operation, the following is performed under Namenode:

Enter the bin directory, initialize:./hadoop Namenode-format

If there is no error, as shown in the figure, the initialization is successful

Exit the Bin directory, enter Sbin, execute:./start-all.sh If it is the first time, the system will ask yes/no, answer yes.
After entering JPS view, master node:

24816 ResourceManager
24387 NameNode
27619 Jps
24635 Secondarynamenode

Slaves node:

27955 NodeManager
30564 Jps
27816 DataNode
If the result is as shown above, the description succeeds.

Go to Web page:x.x.x.46:8088 View work node
Go to Web page:x.x.x.46:50070 View overview
0x05 Error Tip: first if JPS other nodes found that there is no datanode, the configuration has a problem, you can go to the sub-node directory in the Logs folder Hadoop-qsb-datanode-xxx.log see the cause of the error. I've had 2 errors here, and if you have the same problem, you can refer to the changes directly.
The first is to copy the past when the new folders are empty, if there is something to be manually deleted.
Next is the permissions issue, if you encounter an error such as unreachable, try to re-execute the instructions: sudo chown xxx:xxx-r hadoop-2.7.3/Ah, there are instances to run, then run, boom,,, run a search instance, The specific commands are as follows:

Hadoop dfs-mkdir Input//Create input folder Some children's shoes may need to add-p
Hadoop dfs-put/usr/hadoop-2.7.3/hadoop/etc/hadoop/*.xml input// Pour in some files into 
Hadoop dfs-ls input//view file
Hadoop jar. /share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input Output ' dfs[a-z. + '//filter out a string formatted as dfs*
Hadoop dfs-ls output

And then the first time to report this error:

17/11/17 09:10:24 INFO MapReduce. Job:  map 0% reduce 0%
17/11/17 09:10:35 INFO mapreduce. Job:  map 20% reduce 0%
17/11/17 09:10:36 INFO mapreduce. Job:  map 60% reduce 0%
17/11/17 09:10:38 INFO mapreduce. Job:  map 80% reduce 0%
17/11/17 09:10:43 INFO mapreduce. Job:  map 100% reduce 0%
17/11/17 09:10:48 INFO mapreduce. Job:  map 100% reduce 7%

17/11/17 09:13:40 INFO mapreduce. Job:task Id:attempt_1510879394295_0002_r_000000_0, status:failed
Error: Org.apache.hadoop.mapreduce.task.reduce.shuffle$shuffleerror:error in Shuffle in fetcher#2

Can see AH, which map should be nothing wrong, then reduce in 7% stuck, and then after a while to report the following error, Baidu discovered is a memory explosion, combined with my then in the Yarn-site.xml configuration:

  <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</ value>
        <discription> free memory per node, MB, default 8182mb</discription>
   </property>
   < property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value >
</property>

All right, my virtual machine is too slag, the first paragraph is removed, the second paragraph of 2048 changed to 1537 (minimum 1536), and then try to succeed.
The output should look like this:

17/11/17 20:58:37 INFO MapReduce. Job:  map 0% reduce 0%
17/11/17 20:59:15 INFO mapreduce. Job:  map 67% reduce 0%
17/11/17 20:59:37 INFO mapreduce. Job:  map 100% reduce 0%
17/11/17 20:59:40 INFO mapreduce. Job:  map 100% reduce 100%
17/11/17 20:59:42 INFO mapreduce. Job:job job_1510923270204_0002 completed successfully
17/11/17 20:59:43 INFO mapreduce. Job:counters:50 ...
Here are some statistics ...

There should be no problem with the instance running, and then the problem will be changed.
The output is as follows:

2 dfs.replication 1 dfsadmin 1 dfs.permissions 1 dfs.name.dir 1 Dfs.data.dir 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.