Hadoop environment installation and simple map-Reduce example

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Reference Books: hadoop authoritative guide-Version 2 (Chinese)

Ii. hadoop environment Installation

1. Install the sun-jdk1.6 version

1) currently, I only build a hadoop environment on one server (centos5.5). Therefore, I first uninstall the installed Java.

Uninstall command: Yum-y remove Java

2) download sun-jdk1.6, address: http://download.oracle.com/otn-pub/java/jdk/6u33-b04/jdk-6u33-linux-x64-rpm.bin

3) install Java (go to the directory where the JDK Installation File is located)

Add binfile permission: chmod A + x *

Install, sudo./jdk-6u33-linux-x64-rpm.bin

(If it is installed under a common user, you need to add a statement under the/etc/sudoers file to indicate that the current user can have root permissions. The specific command is as follows:

A. Su Root

B. chmod U + w/etc/sudoers

C. Vim/etc/sudoers

D. Add a line "username (sudoer username you want to create) All = (all) All" under root all = (all) All. Save and exit.

E. chmod U-W/etc/sudoers

)

4) set java_home

Edit the. bashrc file in the user directory and set the java_hoe command: Export java_home =/usr

2. Install hadoop

1) download the corresponding hadoop file from http://hadoop.apache.org/common/releases.html#download( I downloaded version 1.0.3)

2) decompress the file

Command: tar-xzf hadoop-1.0.3.tar.gz

3) test whether hadoop is successfully installed (go to The hadoop installation directory and execute the following commands in sequence)

A. mkdir Input

B. CP CONF/*. xml Input
C. bin/hadoop jar hadoop-examples-*. Jar grep input output 'dfs [A-Z.] +'
D. Cat output/* ("1 dfsadmin" indicates that hadoop is successfully installed)

4) set Environment Variables

Export hadoop_home =/home/username/hadoop/hadoop-1.0.3.
Export Path = $ path: $ hadoop_home/bin
Export classpath =.: $ hadoop_home/hadoop-core-1.0.3.jar: $ hadoop_home/lib: $ classpath

3. Simple map-Reduce example

From the beginning, follow the 20 ~ A simple maxtemperature example of running content on 23 pages (or refer to the page http://answers.oreilly.com/topic/455-get-started-analyzing-data-with-hadoop/) has never been used. In the command line environment, enter

% export HADOOP_CLASSPATH=build/classes% hadoop MaxTemperature input/ncdc/sample.txt output

It is shown that the classnotfound error is similar, and then modified, and an ioexception is thrown. After finding it online for a long time, we can get a feasible solution.

1. Reference

Http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html

Http://blog.endlesscode.com/2010/06/16/simple-demo-of-mapreduce-in-java/

2. Main Steps

Mkdir maxtemperature
Javac-D maxtemperature. Java
Jar CVF maxtemperature. jar-C maxtemperature /.
Hadoop jar maxtemperature. Jar maxtemperature sample.txt output

Note:

Copy the code of the map and reduce classes to maxtemperature. Java, add the static attribute, and run the javac command. If an iterator error is reported, add the corresponding package as follows:

Import java. util. collection;
Import java. util. hashset;
Import java. util. iterator;

Iv. Thoughts

The main difficulty in setting up the hadoop environment for the first time today is that I encountered an error when performing step-by-step operations according to the instructions in the book. I am not sure whether the knowledge in the book is outdated or my operation mistakes, in addition, I am not familiar with Java, which wastes several hours. Finally, we found a correct solution and successfully ran a simple example of Map-reduce (standalone mode ). In general, I took the first step and had a little sense of accomplishment. We hope to use this summer to study hadoop in depth. Come on ~

5. Supplement

The reference page 25th in the hadoop authoritative guide (Chinese version 2) shows that hadoop is not compatible with previous APIs from version 0.20.0, you must rewrite the previous application to make the new API take effect. This indicates that the old API will report some strange errors similar to classnotfound.

Here, I will add some obvious differences between the newly added API and the old API (from the book ):

1. New APIs tend to use abstract classes instead of interfaces, because they are easier to expand. For example, you can add a method (implemented by default) to an abstract class without modifying the implementation method before the class. In the new API, Mapper and CER are abstract classes.
2. The new API is in the org. Apache. hadoop. mapreduce package (and sub-package. Earlier versions of APIs are stored in org. Apache. hadoop. mapred.
3. The new API uses context object extensively and allows user code to communicate with the mapreduce system. For example, mapcontext basically acts as outputcollector and reporter of jobconf.
4. The new API supports both "push" and "pull" iterations. In the two old and new APIs, key/value record pairs are pushed to Mapper, but in addition, the new API allows pulling records from the map () method, which is also applicable to Cer. A useful example of the pull type is to process records in batches, rather than one by one.
5. The configuration of the new API is unified. The old API has a special jobconf object used for job configuration. This is an extension of hadoop's common configuration object (for configuration of daemon, see section 5.1 ). In the new API, there is no such difference, so the job configuration is completed through configuration.
6. The execution of Job control is the responsibility of the job class, rather than the jobclient, which is no longer stored in the new API.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop environment installation and simple map-Reduce example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop environment installation and simple map-Reduce example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support