Java's beauty [from rookie to expert walkthrough] full distributed installation of Hadoop under Linux

Last Update:2015-01-14 Source: Internet

Author: User

Tags gz file tmp folder rsync

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Two cyan

Email: [Email protected] Weibo: HTTP://WEIBO.COM/XTFGGEF

Would like to install a single-node environment is good, and then after the installation of the total feel not enough fun, so today continue to study, to a fully distributed cluster installation. The software used is the same as the previous one-node installation of Hadoop, as follows:

Ubuntu 14.10-Bit Server Edition
Hadoop2.6.0
JDK 1.7.0_71
Ssh
Rsync

Prepare the Environment

is still VirtualBox + Ubuntu 14.10 a Bit, but this time is 3 nodes, words do not say, the following start configuration preparation, the basic environment will not repeat, including installation of Jdk,ssh,rsync, you can refer to the previous article.

Master 192.168.1.118 NameNode
Slave1 192.168.1.189 DataNode1
Slave2 192.168.1.116 DataNode2

To modify the hostname of each machine, add the following configuration at the end of the/etc/hosts file:

192.168.1.118   master192.168.1.189   slave1192.168.1.116   slave2

Configure Namenode to Datanode without key access

Perform the following two lines of commands directly on the Namenode console:

$ ssh-keygen-t Dsa-p "-F ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Enter the user root directory of the Namenode, and then enter the. SSH directory to view the generated files: Authorized_keys, ID_DSA, id_dsa.pub

Distribute the Authorized_keys file to a Datanode node:

Verify:

SSH 192.168.1.189

SSH 192.168.1.116

SSH slave1

SSH slave2

If you do not need to enter the password directly then OK, or re-match.

Installing Hadoop

1. Download the Hadoop 2.6.0 tar.gz file from the website and extract it to the user directory: TAR-ZXVF hadoop-2.6.0.tar.gz.

2. Create the TMP folder in the extracted hadoop-2.6.0 folder.

3. Configure Environment variables

Add the following configuration information to the end of the/etc/profile file (each machine is configured).

# set Hadoop pathexport hadoop_home=/home/adam/hadoop-2.6.0export path= $PATH: $HADOOP _home/bin

Perform. /etc/profile or Source/etc/profile make the configuration effective, then perform Hadoop version to view the Hadoop versions and verify that the environment variables are configured successfully.

4. Configure Hadoop to enter the directory/home/adam/hadoop-2.6.0/etc/hadoop

A>. Edit Core-site.xml

<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/adam/ha Doop-2.6.0/tmp</value> <description>abase for other temporary directories.</description> </ property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</v alue> </property> <property> <name>io.file.buffer.size</name> <value> 4096</value> </property></configuration>

B>. Configure the Java_home environment variables in hadoop-env.sh and yarn-env.sh as follows

3. Edit Hdfs-site.xml

<configuration> <property> <name>dfs.nameservices</name> <value>hadoop-cluster&         lt;/value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value></property> <property> <name>dfs.namenode.name.dir</name> <value>file:///ho Me/adam/hadoop-2.6.0/dfs/name</value> </property> <property> <name>dfs.datanode.data. dir</name> <value>file:///home/adam/hadoop-2.6.0/dfs/data</value> </property> <pro perty> <name>dfs.replication</name> <value>1</value> </property> < property> <name>dfs.webhdfs.enabled</name> <value>true</value> &LT;/PROPERTY&G T;</configuration>

4. Edit Mapred-site.xml

<configuration>    <property>        <name>mapreduce.framework.name</name>        <value >yarn</value>    </property>    <property>        <name> mapreduce.jobtracker.http.address</name>        <value>master:50030</value>    </property >    <property>        <name>mapreduce.jobhistory.address</name>        <value>master :10020</value>    </property>    <property>        <name> mapreduce.jobhistory.webapp.address</name>        <value>master:19888</value>    </ Property></configuration>

5. Edit Yarn-site.xml

<configuration><!--Site Specific YARN Configuration Properties--<property> <name>yar n.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> &LT;PR Operty> <name>yarn.resourcemanager.address</name> <value>master:8032</value> & lt;/property> <property> <name>yarn.resourcemanager.scheduler.address</name> <val ue>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-        tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </propert y> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8 088</value> </property></configuration>

6. Edit the slaves file to add the following two lines:

Slave1

Slave2

7. Copy the Hadoop folder to a different slave node

Start Hadoop

1. Formatting Namenode

[email protected]:~/hadoop-2.6.0/bin$./hdfs NAMENODE-FORMAT15/01/14 19:29:58 INFO namenode. Namenode:startup_msg:/************************************************************startup_msg:starting Namenodestartup_msg:host = Ubuntu/60.191.124.254startup_msg:args = [-format]startup_msg:version = 2.6.0STARTUP_MS G:classpath =/home/adam/hadoop-2.6.0/etc/hadoop:/home/adam/hadoop-2.6.0/share/hadoop/common/lib/ slf4j-log4j12-1.7.5.jar:/home/adam/hadoop-2.6.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/adam/h ... jar:/ home/adam/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.0.jar:/home/adam/hadoop-2.6.0/ share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:/home/adam/hadoop-2.6.0/share/hadoop/mapreduce/   Hadoop-mapreduce-client-shuffle-2.6.0.jar:/home/adam/hadoop-2.6.0/contrib/capacity-scheduler/*.jarstartup_msg: build = Https://git-wip-us.apache.org/repos/asf/hadoop.git-r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; Compiled by ' Jenkins ' on 2014-11-13t21:10zstartup_msg:java = 1.7.0_71************************************************************/15/01/14 19:29:58 INFO Namenode. Namenode:registered UNIX signal handlers for [term, HUP, INT]15/01/14 19:29:58 INFO NameNode. Namenode:createnamenode [-format]formatting using CLUSTERID:CID-3F81E813-604E-4D60-93B1-9794D7C7C07915/01/14 19:30:10 INFO Namenode. Fsnamesystem:no keyprovider FOUND.15/01/14 19:30:10 INFO namenode. Fsnamesystem:fslock is FAIR:TRUE15/01/14 19:30:10 INFO blockmanagement. DATANODEMANAGER:DFS.BLOCK.INVALIDATE.LIMIT=100015/01/14 19:30:10 INFO blockmanagement. DATANODEMANAGER:DFS.NAMENODE.DATANODE.REGISTRATION.IP-HOSTNAME-CHECK=TRUE15/01/14 19:30:10 INFO blockmanagement. BlockManager:dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.00015/01/14 19:30:10 INFO Blockmanagement. Blockmanager:the block deletion would start around Jan 19:30:1015/01/14 19:30:10 INFO util. Gset:computing capacity for map BLOCKSMAP15/01/14 19:30:10 INFO util. Gset: VM type = 64-BIT15/01/14 19:30:10 INFO util. gset:2.0% Max memory 966.7 MB = 19.3 mb15/01/14 19:30:10 INFO util. gset:capacity = 2^21 = 2097152 ENTRIES15/01/14 19:30:10 INFO blockmanagement. BLOCKMANAGER:DFS.BLOCK.ACCESS.TOKEN.ENABLE=FALSE15/01/14 19:30:10 INFO blockmanagement. Blockmanager:defaultreplication = 115/01/14 19:30:10 INFO blockmanagement. Blockmanager:maxreplication = 51215/01/14 19:30:10 INFO blockmanagement. Blockmanager:minreplication = 115/01/14 19:30:10 INFO blockmanagement. Blockmanager:maxreplicationstreams = 215/01/14 19:30:10 INFO blockmanagement. Blockmanager:shouldcheckforenoughracks = FALSE15/01/14 19:30:10 INFO blockmanagement. Blockmanager:replicationrecheckinterval = 300015/01/14 19:30:10 INFO blockmanagement. Blockmanager:encryptdatatransfer = FALSE15/01/14 19:30:10 INFO blockmanagement. Blockmanager:maxnumblockstolog = 100015/01/14 19:30:10 INFO namenode. Fsnamesystem:fsowner = ADam (auth:simple) 15/01/14 19:30:10 INFO Namenode. Fsnamesystem:supergroup = SUPERGROUP15/01/14 19:30:10 INFO namenode. fsnamesystem:ispermissionenabled = TRUE15/01/14 19:30:10 INFO namenode. fsnamesystem:determined nameservice ID:HADOOP-CLUSTER15/01/14 19:30:10 INFO namenode. Fsnamesystem:ha ENABLED:FALSE15/01/14 19:30:10 INFO namenode. Fsnamesystem:append ENABLED:TRUE15/01/14 19:30:16 INFO util. Gset:computing capacity for map INODEMAP15/01/14 19:30:16 INFO util. GSET:VM type = 64-BIT15/01/14 19:30:16 INFO util. gset:1.0% Max memory 966.7 MB = 9.7 mb15/01/14 19:30:16 INFO util. gset:capacity = 2^20 = 1048576 ENTRIES15/01/14 19:30:16 INFO namenode. namenode:caching file names occuring more than TIMES15/01/14 19:30:16 INFO util. Gset:computing capacity for map CACHEDBLOCKS15/01/14 19:30:16 INFO util. GSET:VM type = 64-BIT15/01/14 19:30:16 INFO util. gset:0.25% Max memory 966.7 MB = 2.4 mb15/01/14 19:30:16 INFO util. gset:capacity = 2^18 = 262144 entrIES15/01/14 19:30:16 INFO Namenode. fsnamesystem:dfs.namenode.safemode.threshold-pct = 0.999000012874603315/01/14 19:30:16 INFO namenode. FSNamesystem:dfs.namenode.safemode.min.datanodes = 015/01/14 19:30:16 INFO namenode. FSNamesystem:dfs.namenode.safemode.extension = 3000015/01/14 19:30:16 INFO namenode. Fsnamesystem:retry cache on Namenode is ENABLED15/01/14 19:30:16 INFO namenode. Fsnamesystem:retry Cache would use 0.03 of total heap and Retry cache entry expiry time is 600000 MILLIS15/01/14 19:30:16 INFO util. Gset:computing capacity for map NAMENODERETRYCACHE15/01/14 19:30:16 INFO util. GSET:VM type = 64-BIT15/01/14 19:30:16 INFO util. gset:0.029999999329447746% Max memory 966.7 MB = 297.0 kb15/01/14 19:30:16 INFO util. gset:capacity = 2^15 = 32768 entries15/01/14 19:30:16 INFO namenode. NNCONF:ACLS enabled? FALSE15/01/14 19:30:16 INFO Namenode. Nnconf:xattrs enabled? TRUE15/01/14 19:30:16 INFO Namenode. Nnconf:maximum size of an XATTR:1638415/01/14 19:30:16 INFO nAmenode. fsimage:allocated new BLOCKPOOLID:BP-1507698623-60.191.124.254-142123501646815/01/14 19:30:16 INFO Common. Storage:storage Directory/home/adam/hadoop-2.6.0/dfs/name has been successfully FORMATTED.15/01/14 19:30:17 INFO Namenode. Nnstorageretentionmanager:going to retain 1 images with Txid >= 015/01/14 19:30:17 INFO util. Exitutil:exiting with status 015/01/14 19:30:17 INFO namenode. Namenode:shutdown_msg:/************************************************************shutdown_msg:shutting down NameNode at ubuntu/60.191.124.254************************************************************/

2. Start Hadoop

Hadoop/sbin start-all.sh or start-dfs.sh & start-yard.sh

3. Verifying the Installation

Such a complete distributed Hadoop cluster is installed, the steps are not many, interested students can try, have any questions welcome to contact me:

Weibo: HTTP://WEIBO.COM/XTFGGEF

Email: [Email protected]

Java's beauty [from rookie to expert walkthrough] full distributed installation of Hadoop under Linux

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More