How do I install Hadoop under CentOS and connect to eclipse?

Source: Internet
Author: User
Tags xsl

How do I install Hadoop under CentOS and connect to eclipse?

I planned to learn Hadoop a long time ago, until recently on the agenda. It took some time to set up Hadoop under CentOS, and the "frustration" experienced before and after could be written in a thousands of-word history of tears. has been the online tutorial pit of the bitter encounter, but also the department of large super brother and internship Company's love support. Today we can finally sit down and talk about how to install Hadoop under CentOS and connect with Eclipse.

Let's talk about what software and information to prepare:

Vmware-workstation;

Centos-6.0-i386-bin-dvd;

Eclipse-jee-luna-sr1-win32;

hadoop-0.20.2;

jdk-6u27-linux-i586;

(due to the high demand for Hadoop version, we still do not change the version easily, the various software listed here are stable release version, the Internet is easy to download).

The entire tutorial is divided into 5 parts: 1) Install virtual machine VMware under Windows and create a new virtual machine, install the CentOS system, 2) set up the SSH service under CentOS without password login, 3) installing JDK under CentOS and configuring environment variables 4) Install Hadoop and configuration files under CentOS, 5) Install JDK and Eclipse under Windows, and connect Eclipse to Hadoop under CentOS. It can be said that each of these 5 parts is important, especially the 4th step. Let's talk a little bit about what to do at each step.

Step 0: please, before you Windows under Create a new normal user, the user name is Hadoop , all of our software is installed under this, the user name is best if Hadoop because this is going to be and a lot behind username same, set to Hadoop better remember.

1) install virtual machine Vmwareunder Windowsand create a new virtual machine mount CentOS System;

First, download vmware-workstation and install, this step and general windows under the process of installing software is the same, the introduction of small white will also be skilled operation, here to save a bit of space for the following important steps ~

Then, create a new virtual machine on the VMware home page, such as:


The next step, until you choose the system image path, we select the CentOS system image, for example, click Next. Then, you need to enter the Linux user name, this is more important, it is best to fill out Hadoop, because this name in the back to use many times!




The following is the "next" step, until you set the virtual machine's memory size, 1024M is recommended. Such as. The following is the option to select settings related to the network type of the virtual machine, which is recommended for "Use Network address translation Nat", for example. This step I chose the function of automatic bridging, found a night of error ... Time is so shining white of the Lost ~ ~



After the "next" step, almost all using its recommended settings, we can create a new CentOS, wait a few minutes and then get into the CentOS interface. See that a touch of science and technology blue, have you moved it ~ ~ Hahaha, you did go the first step!

2) installation of the ssh service under CentOS without password login;

Right click on the desktop, choose Openin Terminal, this is the Linux terminal. Hopefully the reader has some basics of Linux operating system, which is faster to get started. But if not, it doesn't matter, we are a beginner-oriented tutorial.




2.1. First enter SU on the Linux command line, prompt for the password, enter your own password, this way you will have the most privileges under the Linux system--root permissions.

2.2. before setting up SSH without password login, there is a particular importance to first out: turn off SELinux. This is because CentOS will automatically prevent you from modifying Sshservice, we only have to turn off SELinux and reboot to take effect. How to do it as follows:

Modify the/etc/selinux/config file

Change selinux=enforcing to selinux=disabled

Restart the machine

(Note: Under Linux to modify the file, vi command will go to the File window, press I into the INSERT, after the modification is completed and then press ESC to launch Insert, input;: wq! Save and Exit ~ Here to thank the Bubble elder brother, changed a half-day is not, or bubble brother pointing maze ~ ~)

2.3. in the Linux command line, enter:ssh-keygen-t RSA, then go all the way to the return.

[Email protected]:~ $ssh-keygen-t RSA

Generating public/private Rsakey pair.

Enterfile in which to save the key (/HOME/ZHANGTAO/.SSH/ID_RSA)://Key save location, direct enter to remain default;

Createddirectory '/home/zhangtao/.ssh '.

Enter passphrase (empty for no passphrase)://Set the password of the key, the empty password directly enter;

Enter Samepassphrase again://confirm the password set in the previous step.

Then enter the/root/.ssh/below and you will see two files Id_rsa.pub,id_rsa,

Then execute the CP id_rsa.pub Authorized_keys

Then SSH localhost verification is successful, the first time you want to enter Yes, you don't need it later.

For example, because I have verified again, I also need to enter Y, if you are the first time to verify is not.



At this point, SSH service without password login setup is complete!

3) install jdkunder CentOSand configure environment variables;

This step can be divided into two steps: Installing the JDK and configuring the JDK environment variables.

3.1. first step: Root User Login, use command mkdir/usr/program new directory /usr/program , download JDK Install package Jdk-6u13-linux-i586.bin, copy it to directory /usr/program , enter the directory with the CD command, execute the command "./jdk-6u13-linux-i586.bin ", after the command is finished, the installation is completed and the folder /jdk1.6.0_13will be generated in the directory, which is the successful installation of the JDK to the directory:/usr/program/jdk1.6.0_13 .

3.2. root User login, command line execute command "vi/etc/profile", and add the following content , configure environment variables (Note/etc/profile This file is important, behind Hadoop Configuration will also be used).

# Set Java environment

Exportjava_hom e=/usr/program/jdk1.6.0_27

Exportjre_home=/usr/program/jdk1.6.0_27/jre

Export classpath=.:$JAVA _home/lib:$JAVA _home/jre/lib

Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH

After adding the above content in the VI editor, save the exit and execute the following command to make the configuration effective!

#chmod +x/etc/profile; increase execution permissions

#source/etc/profile; make configuration effective!

After the configuration is complete, enter:java-versionon the command line, and the information for installing the JDK will appear.




At this point, the JDK installation and configuration environment variables are successful ~

4) install hadoopunder CentOS, and configure the file;

4.1. before installing Hadoop, know the IP address in your CentOS: Enter ifconfig in the terminal to see the IP address. For example: (Mine is 192.168.154.129)




4.2. Download the hadoop-0.20.2.tar.gz, copy it to the/usr/local/hadoop directory, and under /usr/local/hadoop in that directory Unzip the installation build file /hadoop-0.20.2 (that is, Hadoop is installed in the/usr/local/hadoop/hadoop-0.20.2 folder).

The command is as follows: tar-zxvf hadoop-0.20.2. tar.gz Unzip the installation one step to complete!  

4.3. Configure the environment variables for Hadoop first

Command "vi/etc/profile"

#set Hadoop

Export hadoop_home=/usr/hadoop/hadoop-0.20.2

Export path= $HADOOP _home/bin: $PATH

Command:source/etc/profile make the newly configured file effective!

Enter/usr/local/hadoop/hadoop-0.20.2/conf, configure the Hadoop configuration file

4.4. Configuring the hadoop-env.sh file

Open File command:vihadoop-env.sh

Add # Set Javaenvironment

Export java_home=/usr/program/jdk1.6.0_27

After editing save exit (prompt, enter: wq!). In fact, a closer look will find hadoop-env.sh file itself has java_home this line, we just need to put the previous comment # Cancel, and then modify the home address is good. As shown in the following:


4.5. Configuring Core-site.xml

[Email protected] conf]# VI core-site.xml

<?xml version= "1.0"?>

<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>

<!--Put Site-specific property overridesin the this file. -

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://192.168.154.129:9000/</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/hadoop-0.20.2/hadooptmp</value>

</property>

</configuration>

(Note:hdfs must be the IP address of your CentOS, which is why you should first ifconfig the IP address above.) Some tutorials say the kind of localhost, is not correct! The back and eclipse will not connect!! It's been a night's time ... )

As shown in the following:

Description:Two important directory structures of the Hadoop Distributed file System, one is the storage place of the namespace on the Namenode, the place where the Datanode data block is stored, and some other file storage places, These storage places are based on the Hadoop.tmp.dir directory, such as Namenode's namespace storage Place is ${hadoop.tmp.dir}/dfs/name,datanode data Block storage Place is ${hadoop.tmp.dir }/dfs/data, so after setting up the Hadoop.tmp.dir directory, the other important directories are under this directory, which is a root directory. I set the/usr/local/hadoop/hadoop-0.20.2/hadooptmp, and of course this directory must be present.

4.6. Configuring Hdfs-site.xml

<?xml version= "2.0"?>

<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

<!--Put Site-specific property overridesin the this file. -

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

"Hdfs-site.xml" 15L, 331C



(Note: Where dfs.replication value is 1 because we are configured here is a single-machine pseudo-distributed, only one machine ~ behind the dfs.permissions is to let users have permission ~)

4.7. Configuring Mapred-site.xml

[Email protected] conf]# VI mapred-site.xml

<?xml version= "1.0"?>

<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>

<!--Put Site-specific property overridesin the this file. -

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>192.168.154.129:9001</value>

</property>

</configuration>

Such as:


4.8. Masters files and slaves files (typically the default content for this two files is the following, no reconfiguration required)

[Email protected] conf]# VI Masters

192.168.154

[Email protected] conf]# VI Slaves

192.168.154


Note: Because the Namenode as master in pseudo-distribution mode is the same server as the Datanode of slave, the IP in the configuration file is the same.  

4.9. host name and IP resolution configuration (This step is very important!!!) )

first [[Email protected]~]# vi/etc/hosts,

then [[Email protected]~]# vi/etc/hostname,

finally [[Email protected]~]# vi/etc/sysconfig/network.

Note: The configuration of these three locations must be consistent and hadpoop to work properly! Host name configuration is very important!





4.9. starting Hadoop

Enter the/usr/local/hadoop/hadoop-0.20.2/bin directory and type the Hadoop namenode-format formatted namenode.

Start all Hadoop processes and enter start-all.sh:


Verify that Hadoop is not up, enter JPS:

if the red circle Tasktracker, Jobtracker, DataNode, Namenode It's all up, that means your Hadoop installation is successful!

Description: 1.secondaryname is a backup of Namenode, which also preserves the namespace and file-to-file block map relationships. It is recommended to run on another machine, so that after the master dies, you can retrieve the name space by the machine where the Secondaryname is located, and the file to the file block map relational data, restore Namenode.

2. After booting, the data directory will be generated in the Dfs folder under/usr/local/hadoop/hadoop-1.0.1/hadooptmp, which holds the block data on the Datanode, because the author uses a single machine, so name and Data is on a machine, and if it is a cluster, there will only be a name folder on the machine where the Namenode resides, and only the data folder will be on the Datanode.

5) install jdk and Eclipseunder Windows, and Hadoop connection with CentOS under Eclipse;

Installing the JDK under Windows is simple, download the JDK installation under Windows. Eclipse directly extracts the installation package to use. Let's talk about how to connect eclipse with Hadoop.

5.1. First in the firewall to shut down the linux ;

Shut down the Linux firewall before connecting, or you will always be prompted in Eclipse project to "Listing folder content ...", causing the connection not to go up. Here's how to turn off the firewall:

Input command:chkconfig iptables off, then reboot reboot.

After rebooting, enter the command:/etc.init.d/iptables status to see the state of the firewall shutdown. (display "Iptables:firewall is not running.") )

5.2. plug-in installation and configuration Eclipse Parameters

Download the plugin hadoop-eclipse-plugin-0.20.3-snapshot, put it in the Plugins folder under Eclipse, and restart Eclipse to find the DFS in the Project Explorer column Locations. Such as:



One more hadoopmap/reduce option in Windows-preferences, select this option, and then select the downloaded Hadoop root directory.

Open Map/reducelocation in the view and you will find the yellow elephant icon in the location area below. In the Loctaion blank area, right-click: New Hadoop location ...

Configure the general parameters as follows:


Click Finish, close Eclipse, restart Eclipse, and you'll find a purple elephant in the location area, right-click on it, and configure advanced parameters parameters. (Note that the process of constantly shutting down and restarting eclipse is strictly enforced, otherwise there will be some parameters on the Advanced Parameters page.)

Setting the parameters of the Advancedparameters page is the most time-consuming and requires a total of 3 parameters to be modified, which was not found at the beginning of the page:

First parameter: Hadoop.tmp.dir. This default is/tmp/hadoop-{user.name}, because we hadoop.tmp.dir set in Ore-defaulte.xml is/usr/local/hadoop/hadoop-0.20.2/hadooptmp , so here we also change to/usr/local/hadoop/hadoop-0.20.2/hadooptmp, other based on this directory property will automatically change;

Second parameter: dfs.replication. Here the default is 3, because we again hdfs-site.xml inside set to 1, so here also set to 1;

The third parameter: Hadoop.job.ugi. Here to fill in: Hadoop,tardis, comma in front of the connected Hadoop user, after the comma is written Tardis.


(Note: Here to say, generally speaking, the first parameter hadoop.tmp.dir very well, you follow the previous steps to restart Eclipse, the Advanced paramters page can be directly found it, modify it, the following two parameters are difficult to come out, Where the Hadoop.job.ugi parameter, it is necessary to ensure that the Linux user name and Windows user name will be the same, otherwise it is not found. Until now I do not know why sometimes just can't find these two parameters, only more close-restart eclipse several times, try again, the online tutorial has not been involved in this situation)

5.3.project the catalogue will show Hadoop of the HDFs The file directory

With the above set up, we'll find out in project that the most important HDFs directory in Hadoop has been shown. Such as:

At this point, Hadoop and Eclipse are connected successfully. Our tutorial is complete and the next time we talk about how to get the WordCount program on eclipse running in Hadoop.






How do I install Hadoop under CentOS and connect to eclipse?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.