I. Reference Books: hadoop authoritative guide-Version 2 (Chinese)
Ii. hadoop environment Installation
1. Install the sun-jdk1.6 version
1) currently, I only build a hadoop environment on one server (centos5.5). Therefore, I first uninstall the installed Java.
Uninstall command: Yum-y remove Java
2) download sun-jdk1.6, address: http://download.oracle.com/otn-pub/java/jdk/6u33-b04/jdk-6u33-linux-x64-rpm.bin
3) install Java (go to the directory where the JDK Installation File is located)
Add binfile permission: chmod A + x *
Install, sudo./jdk-6u33-linux-x64-rpm.bin
(If it is installed under a common user, you need to add a statement under the/etc/sudoers file to indicate that the current user can have root permissions. The specific command is as follows:
A. Su Root
B. chmod U + w/etc/sudoers
C. Vim/etc/sudoers
D. Add a line "username (sudoer username you want to create) All = (all) All" under root all = (all) All. Save and exit.
E. chmod U-W/etc/sudoers
)
4) set java_home
Edit the. bashrc file in the user directory and set the java_hoe command: Export java_home =/usr
2. Install hadoop
1) download the corresponding hadoop file from http://hadoop.apache.org/common/releases.html#download( I downloaded version 1.0.3)
2) decompress the file
Command: tar-xzf hadoop-1.0.3.tar.gz
3) test whether hadoop is successfully installed (go to The hadoop installation directory and execute the following commands in sequence)
A. mkdir Input
B. CP CONF/*. xml Input
C. bin/hadoop jar hadoop-examples-*. Jar grep input output 'dfs [A-Z.] +'
D. Cat output/* ("1 dfsadmin" indicates that hadoop is successfully installed)
4) set Environment Variables
Export hadoop_home =/home/username/hadoop/hadoop-1.0.3.
Export Path = $ path: $ hadoop_home/bin
Export classpath =.: $ hadoop_home/hadoop-core-1.0.3.jar: $ hadoop_home/lib: $ classpath
3. Simple map-Reduce example
From the beginning, follow the 20 ~ A simple maxtemperature example of running content on 23 pages (or refer to the page http://answers.oreilly.com/topic/455-get-started-analyzing-data-with-hadoop/) has never been used. In the command line environment, enter
%
export HADOOP_CLASSPATH=build/classes
%
hadoop MaxTemperature input/ncdc/sample.txt output
It is shown that the classnotfound error is similar, and then modified, and an ioexception is thrown. After finding it online for a long time, we can get a feasible solution.
1. Reference
Http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html
Http://blog.endlesscode.com/2010/06/16/simple-demo-of-mapreduce-in-java/
2. Main Steps
Mkdir maxtemperature
Javac-D maxtemperature. Java
Jar CVF maxtemperature. jar-C maxtemperature /.
Hadoop jar maxtemperature. Jar maxtemperature sample.txt output
Note:
Copy the code of the map and reduce classes to maxtemperature. Java, add the static attribute, and run the javac command. If an iterator error is reported, add the corresponding package as follows:
Import java. util. collection;
Import java. util. hashset;
Import java. util. iterator;
Iv. Thoughts
The main difficulty in setting up the hadoop environment for the first time today is that I encountered an error when performing step-by-step operations according to the instructions in the book. I am not sure whether the knowledge in the book is outdated or my operation mistakes, in addition, I am not familiar with Java, which wastes several hours. Finally, we found a correct solution and successfully ran a simple example of Map-reduce (standalone mode ). In general, I took the first step and had a little sense of accomplishment. We hope to use this summer to study hadoop in depth. Come on ~
5. Supplement
The reference page 25th in the hadoop authoritative guide (Chinese version 2) shows that hadoop is not compatible with previous APIs from version 0.20.0, you must rewrite the previous application to make the new API take effect. This indicates that the old API will report some strange errors similar to classnotfound.
Here, I will add some obvious differences between the newly added API and the old API (from the book ):
1. New APIs tend to use abstract classes instead of interfaces, because they are easier to expand. For example, you can add a method (implemented by default) to an abstract class without modifying the implementation method before the class. In the new API, Mapper and CER are abstract classes.
2. The new API is in the org. Apache. hadoop. mapreduce package (and sub-package. Earlier versions of APIs are stored in org. Apache. hadoop. mapred.
3. The new API uses context object extensively and allows user code to communicate with the mapreduce system. For example, mapcontext basically acts as outputcollector and reporter of jobconf.
4. The new API supports both "push" and "pull" iterations. In the two old and new APIs, key/value record pairs are pushed to Mapper, but in addition, the new API allows pulling records from the map () method, which is also applicable to Cer. A useful example of the pull type is to process records in batches, rather than one by one.
5. The configuration of the new API is unified. The old API has a special jobconf object used for job configuration. This is an extension of hadoop's common configuration object (for configuration of daemon, see section 5.1 ). In the new API, there is no such difference, so the job configuration is completed through configuration.
6. The execution of Job control is the responsibility of the job class, rather than the jobclient, which is no longer stored in the new API.