Hadoop Development Environment Building

Last Update:2015-03-04 Source: Internet

Author: User

Tags gtk xsl hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Very early heard of Hadoop, but the project has not much contact, today finally determined to spend a day, put up the basic development environment, summarized as follows.

I. Software preparation

JDK, Hadoop package, Eclipse software package (Linux edition)

Two. Installing Java

See http://blog.csdn.net/tonytfjing/article/details/42167599

Three. Installing Hadoop (single-machine pseudo-distributed) 3.1 Creating a Hadoop user

Create a dedicated user for Hadoop, as follows:

Groupadd hadoopgroup             //Create Hadoop user group Useradd-g hadoopgroup Hadoop    //Add user Hadoop and add it to Hadoopgroup group passwd Hadoop                    //Create a new password for Hadoop users, password is also Hadoop

3.2 Installing Hadoop

Upload the Hadoop installation package to the Linux system using the FTP tool, unzip

TAR-ZXVF hadoop-1.2.1.tar.gz

3.3 Configuring SSH

Hadoop needs to start the daemon for each host in the slave list through SSH (Secure Shell protocol, which protects the security of shared access). However, because SSH requires user password login, so in order to run the system, password-free login and access between the nodes, you need to SSH configuration password-free way. Specific as follows:

SSH-KEYGEN-T RSA        //Generate key pair, RAS encryption

Press ENTER all the way, and the resulting key pair will be saved to the. Ssh/id_rsa according to the default options.

CD. ssh/            CP id_rsa.pub Authorized_keys//    enter. SSH, append the id_rsa.pub file to the authorization (authorized_keys) inside ssh localhost                    // Test No password login native

3.4 Configuring the Hadoop environment

Switch to Hadoop's installation path Locate the conf/hadoop-env.sh file, vi Edit, and at the end of the file, add the following statement:

Export java_home=/usr/java/jdk1.7.0_71

Also configure the bin directory of the Hadoop installation directory to the system path variable, or if you are unable to use Hadoop commands (you can, of course, go to the bin in the Hadoop installation folder every time./hadoop)

Vi/etc/profileexport hadoop_home=/hadoop1.2.1export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin

Hadoop-1.2.1 has three main profiles, which are easily recognizable and understood from the name: Conf/core-site.xml (Global Profile), Conf/hdfs-site.xml (HDFs profile), and conf/ Mapred-site.xml (mapreduce configuration file). Here's my configuration, name is fixed, and it's easy to understand. Value is based on its own path to match

Conf/core-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>hadoop.tmp.dir</name><value>/hadoop1.2.1/ Tmp</value>        </property><property><name>fs.default.name</name><value> Hdfs://192.168.5.227:9000</value>    <!--It would be better to write this, because the address will be used for programming later--></property></ Configuration>

Conf/mapred-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>mapred.job.tracker</name><value> 192.168.5.227:9001</value></property> <property><name>mapred.cluster.local.dir</name ><value>/hadoop1.2.1/mapred/local</value></property> <property><name> Mapred.jobtracker.system.dir</name><value>/hadoop1.2.1/mapred/system</value></property > </configuration>

Hdfs-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>dfs.namenode.name.dir</name><value>/ Hadoop1.2.1/dfs/name</value></property> <property><name>dfs.datanode.data.dir</name ><value>/hadoop1.2.1/dfs/data</value></property> <property><name> Dfs.replication</name><value>1</value></property></configuration>

3.5 format HDFs file system after modifying the above file, go to the Hadoop installation directory

cd/hadoop1.2.1

Before you install and use Hadoop for the first time, you need to format the Distributed File System HDFs, so execute the following command:

Bin/hadoop Namenode-format

3.6 Starting the Hadoop environment Boot daemon

bin/start-all.sh

After success, at least 5 new processes will be started on this machine: NameNode, DataNode, Jobtracke, Tasktracker, Secondarynamenode

View Hadoop boot Scenarios

JPs

8165 Jps27621 NameNode2982 oc4j.jar28001 JobTracker2454 Bootstrap28142 TaskTracker27764 DataNode27907 Secondarynamenode

If the reality is similar to the above information, it means that Hadoop has started normally, congratulations! Now that Hadoop is installed, it's time to use it.

Iv. preparing for the Eclipse development environment in Hadoop 4.1 install Eclipse and compile the plugin

CP eclipse-sdk-3.4-linux-gtk.tar.gz/opt                    //Copy eclipse-sdk-3.4-linux-gtk.tar.gz installation package to/opt directory cd/opt                                                     //Switch to/ Opt directory TAR-ZXVF eclipse-java-luna-sr1-linux-gtk.tar.gz            //Extract eclipse-sdk-3.4-linux-gtk.tar.gz Packets

Eclipse Plugin compile and use see http://f.dataguru.cn/thread-288619-1-1.html
4.2 Configuring Map/reduce Locations

Restart Eclipse, configure Hadoop installation directory,

If the plugin is installed successfully, open Window-->preferens, you will find the Hadoop map/reduce option, in this option configure Hadoop installation directory. Into the map/reduce perspective, for example,

New location

Configured as follows,

The host and port are the addresses and ports that I have configured in Core-site.xml, Mapred-site.xml, respectively. Exit after configuration is complete. Layers Open Dfs Locations, if you can display the folder (1) stating that the configuration is correct, check the configuration if you see "Deny Connection".

4.3 First map/reduce Project

Create a new Map/reduce project and copy the Wordcount.java in example to the new project. First write a data input file, as follows:

Create the/tmp/workcount directory on HDFs with the command of Hadoop, with the following command:

Hadoop Fs-mkdir/tmp/wordcount

Copy the newly created Word.txt to HDFs with the copyfromlocal command as follows:

Hadoop fs-copyfromlocal/word.txt  /tmp/wordcount/word.txt

This time we go back to eclipse, in the map/reduce locations reconnect look, found more files, as shown below,

4.4 Running the first map/reduce Project

In Wordcount.java, right-click-->run as-->run configurations, configured as follows,

Two configuration parameters are easy to understand, that is, input files and output files. Click Run, running the program, and so on after the run, see the results of the run, also on the Map/reduce locations reconnect see (also can command Hadoop fs-ls/tmp/wordcount/out) will find another folder, The following two files are included.

Using the command to view the part-r-00000 file, Hadoop fs-cat/tmp/wordcount/out/part-r-00000 can view the results of the run.

Well, now set up the basic development environment, the small partners can go to toss!

Hadoop Development Environment Building

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More