Hadoop Environment builds 2_hadoop installation and operating environment

Last Update:2015-03-18 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 operating mode:

Stand-alone Mode (standalone): standalone mode is the default mode for Hadoop. When the source package for Hadoop was first decompressed, Hadoop was unable to understand the hardware installation environment and conservatively chose the minimum configuration. In this default mode, all 3 XML files are empty. When the configuration file is empty, Hadoop runs completely on-premises. Because there is no need to interact with other nodes, the standalone mode does not use HDFS and does not load any of the Hadoop daemons. This mode is mainly used to develop the application logic for debugging MapReduce programs.

Pseudo-Distribution mode: Pseudo-distribution mode runs Hadoop on a single-node cluster, where all daemons run on the same machine. pseudo-distributed This mode adds code debugging on top of the stand-alone mode, allowing you to check memory usage, HDFS input and output, and other daemon interactions.

Full distribution mode (Fully distributed mode): The Hadoop daemon runs on a cluster.

2 downloads Hadoop:------------------Laugh a------------------------

After you get the installation package
Create a folder under the Home/user directory Hadoop for easy management of Hadoop;
Change the owner of the folder: sudo chown-r user:usergroup Hadoop
With this step, you can change the folder and then work on the Ubuntu interface to

3 Configuring the process:

1 Modifying the configuration file:

[Email protected]:~/hadoop/conf$ gedit hadoop-env.sh

Add the following:

Export Java_home=/usr/lib/jvm/java-6-openjdk-amd64 (depends on your machine's JAVA installation path)
Export Hadoop_home=/home/five/hadoop
Export path= $PATH:/home/five/hadoop/bin

Entry into force: source hadoop-env.sh

Tip: Of course you can add a second third in the user environment variable so that the Hadoop command can be executed quickly under any path

2 Modify the file:

File 1:---------------------------

    Thecontents of the conf/core-Site.xml file are modified to the following: <configuration>     <!--Global Properties The following sections are not added, The 1.2.1 example runs back to the problem--     <property>  <name>hadoop.tmp.dir</name>  <value>/ Home/five/hadoop/tmp</value> </property>    <!--file System Properties  --< Property>  <name>fs. 　　 Default.name</name>  <value>hdfs://localhost:9000</value> </property> </configuration>

File 2:---------------------------

conf/hdfs-site.xml file content modified to the following: (Replication default is 3, if not modified, datanode less than three will be error) <configuration> < property>  <name>fs.replication</name>  <value>1</value> </property> < /configuration>

File 3:---------------------------

    This is the configuration file for MapReduce in Hadoop, which is configured with the address and port of Jobtracker. conf/mapred-site.xml file content is modified to the following: <configuration> <property>  <name> mapred.job.tracker</name>  <value>localhost:9001</value> </property> </ Configuration>

3 Format the Hadoop file system, and enter the command at the command line:

Bin/hadoop Namenode-format

Display: 15/03/16 15:43:46 INFO Common. Storage:storage Directory/home/five/hadoop/hdfs/name has been successfully formatted.

If the above command is wrong, execute under the Hadoop directory, or add the following in the environment variable: Hadoop bin directory, run: Hadoop namenode-format

4. start Hadoop and enter commands at the command line:

bin/start-all.sh
./start-all.sh
Use bin/start-all.sh, or use bin/start-dfs.sh and bin/start-mapred.sh)

After 10 seconds, because the error may be:
Then enter it at the command line: JPS
The findings are:
4574 Tasktracker
4630 Jps
2865 Jobtracker
4237 DataNode
4394 Secondarynamenode
4092 NameNode

6 Verify that Hadoop is installed successfully, enter the following URL in the browser, if open correctly, the installation is successful.

http://localhost:50030 (Web page for MapReduce)
http://localhost:50070 (HDFS Web page)

Error message:

1 http://localhost:50070, the second one cannot open:

View logs:
2015-03-16 16:43:20,701 ERROR Org.apache.hadoop.hdfs.server.namenode.NameNode:
Java.net.UnknownHostException:Invalid hostname for Server:master

Open the host in Core-site.xml: see if it's right, because I wrote it wrong before,

2 JPS command, found no datanode: May be caused by multiple formatting Namenode

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Incompatible Namespaceids

1. Enter the current directory of each Datanode Dfs.data.dir, and modify the file version

#Fri Nov 15:00:17 CST 2012
namespaceid=246015542
storageid=ds-2085496284-192.168.1.244-50010-1353654017403
Ctime=0
Storagetype=data_node
Layoutversion=-32

There's a namespaceid to change it into an error message.

Namenode Namespaceid = 971169702

The same namespaceid.

Then restart the Datanode all will start normally.

4 Running the example:　

(1) First establish two input files on local disk FILE01 and FILE02
$echo "Hello World Bye" > File01
$echo "Hello Hadoop Goodbye Hadoop" > File02
For example: Mine is:/home/five/input:file01 FILE02

(2) Create an input directory in HDFs: $hadoop fs-mkdir input
(3) Copy the FILE01 and file02 into HDFs:
$hadoop fs-copyfromlocal/home/five/input/file0* Input
(4) Execution WordCount:
$hadoop jar Hadoop/hadoop-examples-1.2.1.jar WordCount input Hadoop/output

Hadoop jar Hadoop-examples-1.0.4.jar (the jar package for the specified example) WordCount (the name of the specified program)/user/hadoop/test-examples/world-count (input parameter)/ User/hadoop/test-examples/world-count-result (output result)

Error:1: Needs to be the corresponding version in the installation directory: Hadoop-examples-1.2.1.jar

2: If prompted to create Tem folder failed: Delete the attribute in file Conf/core-site.xml

(5) When you are finished, review the results
$hadoop Fs-cat hadoop/output/part-r-00000

The following information is displayed:
.....
.....
15/03/17 15:16:24 INFO mapred. Jobclient:map 0% Reduce 0%
15/03/17 15:16:32 INFO mapred. Jobclient:map 100% Reduce 0%
15/03/17 15:16:40 INFO mapred. Jobclient:map 100% Reduce 33%
15/03/17 15:16:41 INFO mapred. Jobclient:map 100% Reduce 100%

5: Reference:

Http://www.linuxidc.com/Linux/2013-06/86106p2.htm
http://blog.csdn.net/wyswlp/article/details/10564847
Http://www.cnblogs.com/forfuture1978/category/300670.html
Http://www.linuxidc.com/Linux/2012-01/50880p2.htm

Hadoop Environment builds 2_hadoop installation and operating environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More