Hadoop + Hive + Map +reduce cluster installation deployment

Source: Internet
Author: User
Keywords Nbsp; name value xml installation
Environmental preparedness:


CentOS 5.5 x64&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; (3 units)


10.129.8.52 (Master) ======>> Namenode, Secondarynamenode,jobtracker


10.129.8.76 (slave01) ======>> DataNode, Tasktracker


10.129.8.33 (SLAVE02) ======>> DataNode, Tasktracker

The local host file for the
single machine is as follows:


10.129.8.52 Master


10.129.8.76 slave01


10.129.8.33 slave02


(i) configuration good master to slave SSH login trust relationship, do a good job between each other to configure the


(ii) installed Java version "1.6.0_24" source package for Jdk-6u24-linux-x64.bin


(iii) Install Hadoop cluster


download hadoop-1.2.0.tar.gz and unzip to/home/hadoop/hadoop directory


Modify the corresponding configuration file:


1 hadoop-env.sh indicates Java environment variable export java_home=/usr/local/java/jdk1.6.0_24


2 Core-site.xml


<?xml version= "1.0"?>


<?xml-stylesheet type= "text/xsl" href= configuration.xsl "?>"


<!--put site-specific property overrides in this file. -->


<configuration>


<property>


<name>fs.default.name</name>


<value>hdfs://master:9010</value>


</property>


<property>


<name>hadoop.tmp.dir</name>


<value>/home/hadoop/hadoop/tmp</value>


</property>


</configuration>


3 Mapred-site.xml


<?xml version= "1.0"?>


<?xml-stylesheet type= "text/xsl" href= configuration.xsl "?>"


<!--put site-specific property overrides in this file. -->


<configuration>


<property>


<name>mapred.job.tracker</name>


<value>master:9011</value>


</property>


<property>


<name>mapred.local.dir</name>


<value>/home/hadoop/hadoop/tmp</value>


</property>


</configuration>


4 Hdfs-site.xml


<?xml version= "1.0"?>


<?xml-stylesheet type= "text/xsl" href= configuration.xsl "?>"


<!--put site-specific property overrides in this file. -->


<!--


Namenode primarily holds file mappings and file change logs


Secondarynamenode A daemon periodically synchronizes the file change logs from the Namenode and merges them into a single log, making it easy for Hadoop to find the last downtime restore point each time it restarts. will be replaced by Backupnamenode and Namenode clusters in subsequent releases.


jobtracker Task Scheduling daemon


tasktracker Task Execution Process


Dataname data storage nodes are often deployed on the same machine as Tasktracker.


-->


<configuration>


<property>


<name>dfs.name.dir</name>//Specify name image file storage directory, if unspecified then


<value>/home/hadoop/filedata/name01,/home/hadoop/filedata/name02</value>/ Default is the TMP directory configured in Core-site


</property>


<property>


<name>dfs.data.dir</name>//Data storage directory, if not write defaults to

TMP directory configured in
<value>/home/hadoop/filedata/data01</value>//Core-site


</property>


<property>


<name>dfs.replication</name>


<value>2</value>


</property>


</configuration>


5 Masters


Master


6 Slaves (if the host name of Master is included here, then Master will be datanode, if not, then master will only be namenode, not datanode)


slave01


SLAVE02


(iv) Create the appropriate directory:


/home/hadoop/hadoop/tmp hadoop.tmp.dir:Hadoop Default temporary path, this is the best configuration, if the new node or other circumstances inexplicably datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again.


/home/hadoop/filedata namenode The local file system path for persistent storage of namespaces and transaction logs. When this value is a comma-separated list of directories, the NameTable data is replicated to all directories for a redundant backup. (He has subdirectories below, and the program creates them himself)


/home/hadoop/filedta/data01 Datanode A comma-separated list of local file system paths for storing block data. When this value is a comma-separated list of directories, the data is stored in all directories and is typically distributed across different devices. (program created by itself)


(v) formatted on Namenode


/home/hadoop/hadoop/bin/hadoop Namenode-format


view output to ensure the successful format of distributed File System


When you are done, you can see the/home/hadoop//name1 and/home/hadoop//name2 two directories on the master machine. Start Hadoop on the master node master, and the master will start all Hadoop from the node.


(vi)/home/hadoop/hadoop/bin/start-all.sh (start all services)

After the
is finished, you can use JPS to view all of the started services (the startup log is in the logs directory of the Hadoop installation)


[Hadoop@master ~]$ JPS


16276 Secondarynamenode


16374 Jobtracker


16103 Namenode


19003 Jps


at this point can go to see if the Dataname node has created a data directory, of course, can also use JPS view, but my dataname have this command can be used, there are some can not, the reason for the search


(vii) upload file test


/home/hadoop/hadoop/bin/hadoop dfs-put x-forwarded-for-survey.beisen.com-10.22.1.35_d2013070*/home/iis_log/ survey.beisen.com/20130705


(eight) view uploaded files


/home/hadoop/hadoop/bin/hadoop dfs-ls/home/iis_log/survey.beisen.com/20130705
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.