Hadoop Development Environment Building

Source: Internet
Author: User
Tags gtk xsl hadoop fs

Very early heard of Hadoop, but the project has not much contact, today finally determined to spend a day, put up the basic development environment, summarized as follows.

I. Software preparation

JDK, Hadoop package, Eclipse software package (Linux edition)

Two. Installing Java

See http://blog.csdn.net/tonytfjing/article/details/42167599

Three. Installing Hadoop (single-machine pseudo-distributed) 3.1 Creating a Hadoop user

Create a dedicated user for Hadoop, as follows:

Groupadd hadoopgroup             //Create Hadoop user group Useradd-g hadoopgroup Hadoop    //Add user Hadoop and add it to Hadoopgroup group passwd Hadoop                    //Create a new password for Hadoop users, password is also Hadoop

3.2 Installing Hadoop

Upload the Hadoop installation package to the Linux system using the FTP tool, unzip

TAR-ZXVF hadoop-1.2.1.tar.gz

3.3 Configuring SSH

Hadoop needs to start the daemon for each host in the slave list through SSH (Secure Shell protocol, which protects the security of shared access). However, because SSH requires user password login, so in order to run the system, password-free login and access between the nodes, you need to SSH configuration password-free way. Specific as follows:

SSH-KEYGEN-T RSA        //Generate key pair, RAS encryption
Press ENTER all the way, and the resulting key pair will be saved to the. Ssh/id_rsa according to the default options.

CD. ssh/            CP id_rsa.pub Authorized_keys//    enter. SSH, append the id_rsa.pub file to the authorization (authorized_keys) inside ssh localhost                    // Test No password login native
3.4 Configuring the Hadoop environment

Switch to Hadoop's installation path Locate the conf/hadoop-env.sh file, vi Edit, and at the end of the file, add the following statement:

Export java_home=/usr/java/jdk1.7.0_71
Also configure the bin directory of the Hadoop installation directory to the system path variable, or if you are unable to use Hadoop commands (you can, of course, go to the bin in the Hadoop installation folder every time./hadoop)
Vi/etc/profileexport hadoop_home=/hadoop1.2.1export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin
Hadoop-1.2.1 has three main profiles, which are easily recognizable and understood from the name: Conf/core-site.xml (Global Profile), Conf/hdfs-site.xml (HDFs profile), and conf/ Mapred-site.xml (mapreduce configuration file). Here's my configuration, name is fixed, and it's easy to understand. Value is based on its own path to match

Conf/core-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>hadoop.tmp.dir</name><value>/hadoop1.2.1/ Tmp</value>        </property><property><name>fs.default.name</name><value> Hdfs://192.168.5.227:9000</value>    <!--It would be better to write this, because the address will be used for programming later--></property></ Configuration>

Conf/mapred-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>mapred.job.tracker</name><value> 192.168.5.227:9001</value></property> <property><name>mapred.cluster.local.dir</name ><value>/hadoop1.2.1/mapred/local</value></property> <property><name> Mapred.jobtracker.system.dir</name><value>/hadoop1.2.1/mapred/system</value></property > </configuration>
Hdfs-site.xml
<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property><name>dfs.namenode.name.dir</name><value>/ Hadoop1.2.1/dfs/name</value></property> <property><name>dfs.datanode.data.dir</name ><value>/hadoop1.2.1/dfs/data</value></property> <property><name> Dfs.replication</name><value>1</value></property></configuration>
3.5 format HDFs file system after modifying the above file, go to the Hadoop installation directory
cd/hadoop1.2.1
Before you install and use Hadoop for the first time, you need to format the Distributed File System HDFs, so execute the following command:
Bin/hadoop Namenode-format
3.6 Starting the Hadoop environment Boot daemon
bin/start-all.sh

After success, at least 5 new processes will be started on this machine: NameNode, DataNode, Jobtracke, Tasktracker, Secondarynamenode

View Hadoop boot Scenarios

JPs

8165 Jps27621 NameNode2982 oc4j.jar28001 JobTracker2454 Bootstrap28142 TaskTracker27764 DataNode27907 Secondarynamenode

If the reality is similar to the above information, it means that Hadoop has started normally, congratulations! Now that Hadoop is installed, it's time to use it.

Iv. preparing for the Eclipse development environment in Hadoop 4.1 install Eclipse and compile the plugin
CP eclipse-sdk-3.4-linux-gtk.tar.gz/opt                    //Copy eclipse-sdk-3.4-linux-gtk.tar.gz installation package to/opt directory cd/opt                                                     //Switch to/ Opt directory TAR-ZXVF eclipse-java-luna-sr1-linux-gtk.tar.gz            //Extract eclipse-sdk-3.4-linux-gtk.tar.gz Packets
Eclipse Plugin compile and use see http://f.dataguru.cn/thread-288619-1-1.html
4.2 Configuring Map/reduce Locations

Restart Eclipse, configure Hadoop installation directory,


If the plugin is installed successfully, open Window-->preferens, you will find the Hadoop map/reduce option, in this option configure Hadoop installation directory. Into the map/reduce perspective, for example,


New location


Configured as follows,


The host and port are the addresses and ports that I have configured in Core-site.xml, Mapred-site.xml, respectively. Exit after configuration is complete. Layers Open Dfs Locations, if you can display the folder (1) stating that the configuration is correct, check the configuration if you see "Deny Connection".

4.3 First map/reduce Project

Create a new Map/reduce project and copy the Wordcount.java in example to the new project. First write a data input file, as follows:


Create the/tmp/workcount directory on HDFs with the command of Hadoop, with the following command:
Hadoop Fs-mkdir/tmp/wordcount
Copy the newly created Word.txt to HDFs with the copyfromlocal command as follows:
Hadoop fs-copyfromlocal/word.txt  /tmp/wordcount/word.txt

This time we go back to eclipse, in the map/reduce locations reconnect look, found more files, as shown below,


4.4 Running the first map/reduce Project

In Wordcount.java, right-click-->run as-->run configurations, configured as follows,


Two configuration parameters are easy to understand, that is, input files and output files. Click Run, running the program, and so on after the run, see the results of the run, also on the Map/reduce locations reconnect see (also can command Hadoop fs-ls/tmp/wordcount/out) will find another folder, The following two files are included.

Using the command to view the part-r-00000 file, Hadoop fs-cat/tmp/wordcount/out/part-r-00000 can view the results of the run.


Well, now set up the basic development environment, the small partners can go to toss!

Hadoop Development Environment Building

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.