Opening: Hadoop is a powerful parallel software development framework that allows tasks to be processed in parallel on a distributed cluster to improve execution efficiency. However, it also has some shortcomings, such as coding, debugging Hadoop program is difficult, such shortcomings directly lead to the entry threshold for developers, the development is difficult. As a result, HADOP developers have developed a Hadoop Eclipse plug-in to reduce the difficulty of Hadoop, which can be embedded directly into the Hadoop development environment, enabling a graphical interface to the development environment and reducing the difficulty of programming.
One, day drop artifact plugin-hadoop Eclipse
Hadoop Eclipse is a plug-in for the Hadoop development environment, and you need to configure information about Hadoop before installing the plugin. When a user creates a Hadoop program, the Eclipse plug-in automatically imports the jar file of the Hadoop programming interface so that the user can encode, debug, and run the HADOP program in the Eclipse plug-in's graphical interface, as well as view the program's real-time status through the Eclipse plugin, Error message and the result of the run. In addition, the user can manage and view HDFs through the Eclipse plugin.
All in all, the Hadoop Eclipse plugin is simple to install and easy to use. It's powerful, especially in Hadoop programming for developers to reduce a lot of difficulty , is the introduction of Hadoop and development of a good helper!
Ii. Hadoop Eclipse Development Configuration 2.1 Getting the Hadoop Eclipse plug-in
(1) For convenience, we can directly Baidu, I here Hadoop version is 1.1.2, so only need to search for Hadoop-eclipse-plugin-1.1.2.jar, we can download the plugin from the link below.
url:http://download.csdn.net/download/azx321/7330363
(2) Place the downloaded plug-in jar file into Eclipse's plugins directory and restart Eclipse.
(3) After restarting Eclipse, click the button to add the Hadoop Eclipse plug-in view button: First select the other option, pop up the dialog box as shown, select the Map/reduce option from the, and click OK.
(4) Once added, Eclipse will have a map/reduce view button, we can click into the map/reduce working directory view:
2.2 Basic configuration of the Hadoop eclipse plugin
(1) Setting up the installation directory for Hadoop
Select the Windows→preference button in Eclipse, and a dialog box pops up with a hadoop map/reduce option on the left side of the dialog box, then click this option to set up the Hadoop installation directory on the right.
(2) Set up cluster information for Hadoop
You need to establish a connection with the Hadoop cluster, right-click in the Map/reduce locations interface, pop up the options bar, select the new hadoop location option;
In the dialog box that pops up, fill in the information that connects to the Hadoop cluster, as shown in:
The red area shown is where we need to be concerned, and where we need to fill it out well.
PS: Location Name: This is a random fill, I fill in the hostname of my Hadoop master node;
Map/reduce Master in this box:
Host: This is the cluster machine where Jobtracker is located, and here I am 192.168.80.100
Hort: Is the port of Jobtracker, which is written in 9001 (the default port number)
These two parameters are mapred-site.xml inside the mapred.job.tracker inside the IP and port;
DFS Master in this box:
Host: Is the namenode of the cluster machine, I am here because of pseudo-distribution, are on the 192.168.80.100 above
Port: Is the port of Namenode, write 9000 here (the default port number)
These two parameters are the IP and port inside Fs.default.name inside the core-site.xml.
(Use M/R master host, this check box if selected, the default and Map/reduce master in this box, the same as the host, if not selected, you can define the input, here Jobtracker and Namenode on a machine, so is the same , tick the box)
User name: This is the username that connects to Hadoop, I am the root user here;
Next, click the Hadoop.tmp.dir option in the Advanced Parameters tab to modify the address set in your Hadoop cluster, where I set the address in the Hadoop cluster to be/usr/local/hadoop/tmp, Then click the Finish button (this parameter is configured in Core-site.xml)
PS: Most of the properties in the Advanced Parameters tab have been automatically filled in, which is actually showing some of the configuration properties in the core XML configuration files.
After the configuration is complete, return to eclipse, we can see that under Map/reduce locations there will be more than one hadoop-master connection, this is the newly created map/reduce named Hadoop-master Location connection, as shown in:
2.3 View HDFs
(1) The file structure in HDFs is shown by selecting the Hadoop-master option under DFS locations on the left side of Eclipse;
(2) Right click here at the TestDir folder to select a specified file, as shown in:
Iii. running the WordCount program under Eclipse 3.1 creating a Map/reduce project
Select the File→other command, locate Map/reduce Project, and then select it as follows:
Enter the name of the Map/reduce project here: WordCount, click the Finish button to finish, as shown in:
3.2 Creating the WordCount class
Create a new WordCount class here and enter the following code:
View Code3.3 Running the WordCount program
Select WordCount and right-click and select Run on Hadoop mode, as shown in:
The results of the operation are as follows:
3.4 Viewing the results of operations in HDFs
Open the part-r-00000 file under Output folder outputs, which is the execution result of the WordCount program, as shown in:
Resources
(1) Wanchunmei, Xie Zhenglan, "Hadoop Application Development Practical Explanation (Revised version)": http://item.jd.com/11508248.html
(2) Cybercode, "Eclipse Hadoop Development Environment Configuration": http://blog.csdn.net/cybercode/article/details/7084603
original link:http://edisonchou.cnblogs.com/
Hadoop Learning Note -6.hadoop Eclipse plugin usage