Hadoop&spark installation (UP)

Source: Internet
Author: User
Tags xsl ssh server hdfs dfs

Hardware environment:

Hddcluster1 10.0.0.197 REDHAT7

Hddcluster2 10.0.0.228 Centos7 this one as master

Hddcluster3 10.0.0.202 REDHAT7

Hddcluster4 10.0.0.181 Centos7

Software Environment:

Turn off all firewalls firewall

Openssh-clients

Openssh-server

Java-1.8.0-openjdk

Java-1.8.0-openjdk-devel

Hadoop-2.7.3.tar.gz

Process:

    1. Select a machine as Master

    2. Configure Hadoop users on the master node, install SSH server, install the Java environment

    3. Install Hadoop on the master node and complete the configuration

    4. Configure Hadoop users on other Slave nodes, install SSH server, install the Java environment

    5. Copy the/usr/local/hadoop directory on the Master node to the other Slave nodes /p>

    6. Turn on Hadoop on the Master node

 #节点的名称与对应的  IP  relations [[Email protected] ~]$ cat  /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4  localhost4.localdomain4::1         localhost  localhost.localdomain localhost6 localhost6.localdomain610.0.0.228       hddcluster210.0.0.197      hddcluster110.0.0.202       hddcluster310.0.0.181      hddcluster4 
 Create a Hadoop user su  #  the above mentioned  root  user login Useradd -m  hadoop -s /bin/bash   #  Create a new user hadooppasswd hadoop                      # Set up Hadoop password visudo                              #root  all= (All)  all   Add hadoop all= (All)  all 
below this line
#登录hadoop用户, install SSH, configure SSH login without password [[email protected] ~]$ rpm -qa | grep ssh[[ Email protected] ~]$ sudo yum install openssh-clients[[email protected]  ~]$ sudo yum install openssh-server[[email protected] ~] $CD  ~/.ssh/      #  If you do not have this directory, first ssh localhost[[email protected] ~] $ssh-keygen  -t rsa              #  You will be prompted to press ENTER to [[email protected]ster2 ~] $ssh-copy-id -i ~/.ssh/id_rsa.pub localhost  #  join authorization [[email protected] ~] $chmod  600 ./authorized_keys     #  modify file permissions [[email protected] ~] $ssh-copy-id -i ~/.ssh/id_rsa.pub  [email  protected][[email protected] ~] $ssh-copy-id -i ~/.ssh/id_rsa.pub  [email  protected][[email protEcted] ~] $ssh-copy-id -i ~/.ssh/id_rsa.pub  [email protected] 
#解压hadoop文件到/usr/local/hadoop[[email protected] ~] $sudo  tar -zxf hadoop-2.7.3.tar.gz  -c /usr/local/[[email protected] ~] $sudo  mv /usr/local/hadoop-2.7.3 /usr/ Local/hadoop[[email protected] ~] $sudo  chown -r hadoop:hadoop /usr/local/ hadoopcd /usr/local/hadoop./bin/hadoop version# Installing the Java Environment [[email protected] ~] $sudo  yum  install java-1.8.0-openjdk java-1.8.0-openjdk-devel[[email protected] ~]$ rpm  -ql java-1.8.0-openjdk-devel | grep  '/bin/javac '  /usr/lib/jvm/ java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/bin/javac[[email protected] ~]$ vim ~/. Bashrcexport java_home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64export hadoop_ Home=/usr/local/hadoopexport hadoop_install= $HADOOP _homeexport hadoop_mapred_home= $HADOOP _homeexport  hadoop_common_home= $HADOOP _homeexport&nbsp Hadoop_hdfs_home= $HADOOP _homeexport yarn_home= $HADOOP _homeexport hadoop_common_lib_native_dir=$ Hadoop_home/lib/nativeexport path= $PATH: $HADOOP _home/sbin: $HADOOP _home/binexport hadoop_prefix=$ hadoop_homeexport hadoop_opts= "-djava.library.path= $HADOOP _prefix/lib: $HADOOP _prefix/lib/native" # Test Java Environment source ~/.bashrcjava -versionjava_home/bin/java -version  #  and direct execution   Like java -version .
#修改hadoop文件配置 [[email protected] hadoop]$ pwd/usr/local/hadoop/etc/hadoop[[email protected]  hadoop]$ cat core-site.xml<?xml version= "1.0"  encoding= "UTF-8"? ><? Xml-stylesheet type= "text/xsl"  href= "configuration.xsl"?><!--  licensed under  the Apache License, Version 2.0  (the  "License");   you may  not use this file except in compliance with the license.   You may obtain a copy of the License at     http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable  Law or agreed to in writing, software  distributed under the  License is distributed on an  "As is"  BASIS,  WITHOUT  Warranties or condItions of any kind, either express or implied.  see the  License for the specific language governing permissions and   limitations under the License. See accompanying LICENSE file.--> <!-- Put site-specific property overrides in this file. -->< configuration>        <property>                 <name>fs.defaultFS</name>                 <value> hdfs://hddcluster2:9000</value>        </property>         <property>                 <name>hadoop.tmp.dir</name>                 <value>file:/usr/local/hadoop/tmp</value>                 <description>abase  for other temporary directories.</description>         </property></configuration>[[email protected] hadoop]$ cat  Hdfs-site.xml<?xml version= "1.0"  encoding= "UTF-8"? ><?xml-stylesheet type= "Text/xsl"  href= "configuration.xsl"?><!--  licensed under the apache license,  Version 2.0  (the  "License");  you may not use this  File except in compliance with the license.  you may obtain  a copy of the&nBsp license at    http://www.apache.org/licenses/license-2.0  unless  Required by applicable law or agreed to in writing, software   distributed under the License is distributed on an  "as  is " basis,  without warranties or conditions of any kind,  either express or implied.  See the License for the  Specific language governing permissions and  limitations under the  License. See accompanying LICENSE file.--><!-- put site-specific  property overrides in this file. --><configuration>         <property>                 <name>dfs.namenode.secondary.http-address</name>                 <value>hddcluster2:50090</value>         </property>        <property >                <name >dfs.replication</name>                 <value>3</value>        </property >        <property>                 <name>dfs.namenode.name.dir</name>                 <value>file:/ usr/local/hadoop/tmp/dfs/name</value>        </property>         <property>                 <name>dfs.datanode.data.dir</name>                 <value>file:/usr/local/hadoop/tmp/dfs/ data</value>        </property></configuration>[[ Email protected] hadoop]$ [[email protected] hadoop]$ cat mapred-site.xml <?xml version= "1.0"? ><?xml-stylesheet type= "Text/xsl"  href= "configuration.xsl"?> <!--  Licensed under the Apache License, Version 2.0  (the   "License");  you may not use this file except in  Compliance with the license.  you may obtain a copy of the license at     http://www.apache.org/licenses/license-2.0  unless required by applicable  law or agreed to in writing, software  distributed under  the License is distributed on an  "As is"  BASIS,   Without warranties or conditions of any kind, either express or  implied.  See the License for the specific language  governing permissions and  limitations under the license. see  accompanying license file.--><!-- put site-specific property overrides  in this file. --><configuration>         <property>   &nbsP;            <name>mapreduce.framework.name </name>                 <value>yarn</value>        </property>         <property>                 <name>mapreduce.jobhistory.address</name>                 <value>hddcluster2 :10020</value>        </property>         <property>                 <name>mapreduce.jobhistory.webapp.address</name>                 <value>hddcluster2:19888</value>         </property></configuration>[[email protected] hadoop ]$ [[email protected] hadoop]$ cat yarn-site.xml <?xml version= "1.0"? ><!--  Licensed under the Apache License, Version 2.0  (the   "License");  you may not use this file except in  compliance with the license.  you may obtain a copy of  The license at    http://www.apache.org/licenses/license-2.0  unless  required by applicable law or agreed to in writing,  Software  distributed under the license is distributed on an   "As is" &NBSP;BASIS,&NBSP;&NBsp Without warranties or conditions of any kind, either express or  implied.  See the License for the specific language  governing permissions and  limitations under the license. see  accompanying license file.--><configuration><!-- Site specific YARN  Configuration properties -->        <property>                 <name> yarn.resourcemanager.hostname</name>                 <value>hddcluster2</value>         </property>        <property>                 <name>yarn.nodemanager.aux-services</name>                 <value> mapreduce_shuffle</value>        </property></ Configuration>[[email protected] hadoop]$ [[email protected] hadoop]$ cat  slaves hddcluster1hddcluster2hddcluster3hddcluster4
$CD/usr/local$sudo rm-r./hadoop/tmp # Delete Hadoop temp file $sudo rm-r./hadoop/logs/* # Delete log file $tar-zcf ~/hadoop.master . tar.gz./hadoop # First Compress and then copy $CD ~ $scp./hadoop.master.tar.gz HDDCLUSTER1:/HOME/HADOOP$SCP./hadoop.master.tar.gz HDDCLUSTER3:/HOME/HADOOP$SCP./hadoop.master.tar.gz Hddcluster4:/home/hadoop
operate on the Salve node, install the software environment and configure it. Bashrcsudo tar-zxf ~/hadoop.master.tar.gz-c/usr/localsudo chown-r hadoop/usr/local/hadoop
[[email protected] ~] $hdfs  namenode -format       #   First run requires initialization, then no need to start  hadoop , start command on  Master  node: $start-dfs.sh$start-yarn.sh$ Mr-jobhistory-daemon.sh start historyserver can view the processes initiated by each node through command  jps . Correctly, the  namenode, ResourceManager, Secondrrynamenode, jobhistoryserver  processes can be seen on the  Master  node, In addition, you need to use the command  hdfs dfsadmin -report  view  DataNode  to start on the  Master  node if  Live datanodes  is not  0  the cluster started successfully. [[email protected] ~]$ hdfs dfsadmin -reportconfigured capacity:  2125104381952  (1.93&NBSP;TB) present capacity: 1975826509824  (1.80 TB) DFS  remaining: 1975824982016  (1.80&NBSP;TB) dfs used: 1527808  (1.46 MB) DFS Used% : 0.00%under replicated blocks: 0blocks with corrupt replicas:  0missing blocks: 0missing blocks  (with replication factor 1):  0--------------------- ----------------------------live datanodes  (4): View  DataNode  and   are also available through the  Web  page State of namenode : http://hddcluster2:50070/. If this is not successful, you can troubleshoot the cause by starting the log.
The DataNode and NodeManager processes can be seen in the Slave node operation
Testing a Hadoop distributed instance first creates a user directory on HDFs: HDFs dfs-mkdir-p/user/hadoop will/usr/local/hadoop/etc/hadoop The configuration files in the file are copied to the distributed file system as input files: HDFs dfs-mkdir inputhdfs dfs-put/usr/local/hadoop/etc/hadoop/*.xml input by viewing the status of the Datanode (accounting for Changes in size), the input file is actually copied to the DataNode. Then you can run the MapReduce job: Hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input Output ' dfs[a-z. + ' Wait for the output to complete after execution:
Hadoop Start command: start-dfs.shstart-yarn.shmr-jobhistory-daemon.sh start Historyserverhadoop shutdown command: stop-dfs.shstop-yarn.shmr-jobhistory-daemon.sh stop Historyserver


PS: If the cluster has one or two units can not start, first try to delete the Hadoop temporary files

Cd/usr/local

sudo rm-r./hadoop/tmp

sudo rm-r./hadoop/logs/*

And then execute

HDFs Namenode-format

Start again


This article refers to the site and experiment successfully:

http://www.powerxing.com/install-hadoop-cluster/

This article is from "Zen Sword as" blog, please be sure to keep this source http://yanconggod.blog.51cto.com/1351649/1884998

Hadoop&spark installation (UP)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.