HADOOP2 has few learning materials, only a handful of documents on the official website. If you want more in-depth research hadoop2, in addition to only crossing Web documents, but also to learn how to look at the source code, through continuous debugging tracking source code, learning the operating mechanism of Hadoop.
1. Installing CentOS
I am using CentOS6.5, yes, choose Centos-6.5-i386.iso Download, size is 4GB, need to download for some time. In fact, the 6.x version can be, not necessarily 6.5.
I'm using a VMware virtual machine that allocates 2GB of memory and 20GB of disk space. The memory is too small to be slow, the disk is too small, and there may be a lack of space at compile time. The above is not the minimum configuration, according to your own machine configuration modification it. Also, be sure to keep the Linux networking state.
Following is a variety of software, I downloaded the software after all copied to the/usr/local directory, the following command to execute the path is in the/usr/local directory. Please note that the reader must pay attention to the path when reading.
2. Installing the JDK
Hadoop is written in Java, and the JDK must be installed to compile Hadoop.
Download the JDK from Oracle's website, http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html, select jdk-7u45-linux-i586.tar.gz download.
Execute the following command to unzip the JDK
TAR-ZXVF jdk-7u45-linux-i586.tar.gz
A folder jdk1.7.0_45 is generated and then set in the environment variable.
Execute command vi/etc/profile, add the following to the configuration file, the results are shown as follows
After saving the exit file, execute the following command
Source/etc/profile
Java-version
If you see the following display information, it proves that the configuration is correct.
3. Install Maven
The Hadoop source code is managed using MAVEN organization and must be downloaded by maven. Download from MAVEN official website, is http://maven.apache.org/download.cgi, choose apache-maven-3.0.5-bin.tar.gz Download, do not choose 3.1 download.
Execute the following command to unzip the JDK
TAR-ZXVF apache-maven-3.0.5-bin.tar.gz
A folder apache-maven-3.0.5 is generated and then set in the environment variable.
Execute the command vi/etc/profile, and edit the result as shown
After saving the exit file, execute the following command
Source/etc/profile
Mvn-version
If you see the following display information, it proves that the configuration is correct.
4. Install findbugs (optional steps)
The findbugs is used to generate documents. If you do not need to compile the build document, you can not perform this step. Download findbugs from FindBugs official website, is http://sourceforge.jp/projects/sfnet_findbugs/releases/, Select findbugs-3.0.0-dev-20131204-e3cbbd5.tar.gz Download.
Execute the following command to unzip the JDK
TAR-ZXVF findbugs-3.0.0-dev-20131204-e3cbbd5.tar.gz
A folder Findbugs-3.0.0-dev-20131204-e3cbbd5 is generated and then set in the environment variable.
Execute the command vi/etc/profile, and edit the result as shown
After saving the exit file, execute the following command
Source/etc/profile
Findbugs-version
If you see the following display information, it proves that the configuration is correct.
5. Installing PROTOC
Hadoop uses protocol buffer communication, download PROTOC from PROTOC official website, is https://code.google.com/p/protobuf/downloads/list, Select protobuf-2.5.0.tar.gz Download.
In order to compile and install PROTOC, several tools need to be downloaded to execute the following commands sequentially
Yum Install GCC
Yum Intall gcc-c++
Yum Install make
If the operating system is CentOS6.5 then GCC and make are already installed. Other versions are not necessarily. When the command runs, users are required to enter "Y" frequently.
Then execute the following command to extract the Protobuf
TAR-ZXVF protobuf-2.5.0.tar.gz
Generates a folder protobuf-2.5.0, and executes the following command to compile the protobuf.
CD protobuf-2.5.0
./configure--prefix=/usr/local/protoc/
Make && make install
As long as it doesn't go wrong.
After execution, the compiled file is located in the/usr/local/protoc/directory, we set the environment variable
Execute the command vi/etc/profile, and edit the result as shown
After saving the exit file, execute the following command
Source/etc/profile
Protoc--version
If you see the following display information, it proves that the configuration is correct.
6. Install other dependencies
Sequentially execute the following command
Yum Install CMake
Yum Install Openssl-devel
Yum Install Ncurses-devel
Installation is complete.
7. Compiling Hadoop2.2 Source code
Download the 2.2 stable version from the Hadoop website, http://apache.fayea.com/hadoop/common/stable2/, download hadoop-2.2.0-src.tar.gz download.
Execute the following command to unzip the JDK
TAR-ZXVF hadoop-2.2.0-src.tar.gz
A folder Hadoop-2.2.0-src is generated. There is a bug in the source code, here need to modify, edit the directory/usr/local/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth file Pom.xml, execute the following command
Gedit Pom.xml
Under Line 55th, add the following:
Org.mortbay.jetty
Jetty-util
Test
Save to exit.
The above bug is described in https://issues.apache.org/jira/browse/HADOOP-10110, fixed in hadoop3, too far away from us.
Okay, now go into the directory/usr/local/hadoop-2.2.0-src, execute the command
MVN Package-dskiptests-pdist,native,docs
If you do not perform the 4th step, remove the docs from the above command, you do not have to generate a document.
This command will download the dependent jar from the extranet and compile the Hadoop source code, which takes a long time and you can eat.
After waiting for n long, you can see the following results:
[INFO] Apache Hadoop Main ......... ............... SUCCESS [6.936s]
[INFO] Apache Hadoop Project POM ......... ......... SUCCESS [4.928s]
[INFO] Apache Hadoop Annotations ......... .......... SUCCESS [9.399s]
[INFO] Apache Hadoop assemblies ......... .......... SUCCESS [0.871s]
[INFO] Apache Hadoop Project Dist POM ......... ..... SUCCESS [7.981s]
[INFO] Apache Hadoop Maven Plugins ........ ......... SUCCESS [8.965s]
[INFO] Apache Hadoop Auth ......... ............... SUCCESS [39.748s]
[INFO] Apache Hadoop Auth Examples ........ ......... SUCCESS [11.081s]
[INFO] Apache Hadoop Common ......... ............. SUCCESS [10:41.466s]
[INFO] Apache Hadoop NFS ......... ................ SUCCESS [26.346s]
[INFO] Apache Hadoop Common Project ......... ....... SUCCESS [0.061s]
[INFO] Apache Hadoop HDFS ......... ............... SUCCESS [12:49.368s]
[INFO] Apache Hadoop Httpfs ......... ............. SUCCESS [41.896s]
[INFO] Apache Hadoop HDFS bookkeeper Journal .... ..... SUCCESS [41.043s]
[INFO] Apache Hadoop Hdfs-nfs ......... ............ SUCCESS [9.650s]
[INFO] Apache Hadoop HDFS Project ......... ......... SUCCESS [0.051s]
[INFO] Hadoop-yarn ..... ....... ................... SUCCESS [1:22.693s]
[INFO] Hadoop-yarn-api .......... ................. SUCCESS [1:20.262s]
[INFO] Hadoop-yarn-common ......... ............... SUCCESS [1:30.530s]
[INFO] hadoop-yarn-server ......... ............... SUCCESS [0.177s]
[INFO] Hadoop-yarn-server-common ......... ......... SUCCESS [15.781s]
[INFO] Hadoop-yarn-server-nodemanager ......... ..... SUCCESS [40.800s]
[INFO] hadoop-yarn-server-web-proxy ......... ....... SUCCESS [6.099s]
[INFO] Hadoop-yarn-server-resourcemanager ....... ..... SUCCESS [37.639s]
[INFO] hadoop-yarn-server-tests ......... .......... SUCCESS [4.516s]
[INFO] hadoop-yarn-client ......... ............... SUCCESS [25.594s]
[INFO] hadoop-yarn-applications ......... .......... SUCCESS [0.286s]
[INFO] Hadoop-yarn-applications-distributedshell ..... SUCCESS [10.143s]
[INFO] hadoop-mapreduce-client ......... ........... SUCCESS [0.119s]
[INFO] Hadoop-mapreduce-client-core ......... ....... SUCCESS [55.812s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [8.749s]
[INFO] hadoop-yarn-site .......... ................ SUCCESS [0.524s]
[INFO] hadoop-yarn-project ......... .............. SUCCESS [16.641s]
[INFO] Hadoop-mapreduce-client-common ......... ..... SUCCESS [40.796s]
[INFO] hadoop-mapreduce-client-shuffle ......... ..... SUCCESS [7.628s]
[INFO] Hadoop-mapreduce-client-app ........ ......... SUCCESS [24.066s]
[INFO] hadoop-mapreduce-client-hs ......... ......... SUCCESS [13.243s]
[INFO] hadoop-mapreduce-client-jobclient ....... ..... SUCCESS [16.670s]
[INFO] hadoop-mapreduce-client-hs-plugins ....... ..... SUCCESS [3.787s]
[INFO] Apache Hadoop MapReduce Examples ........ ..... SUCCESS [17.012s]
[INFO] hadoop-mapreduce .......... ................ SUCCESS [6.459s]
[INFO] Apache Hadoop MapReduce streaming ....... ..... SUCCESS [12.149s]
[INFO] Apache Hadoop distributed Copy ......... ..... SUCCESS [15.968s]
[INFO] Apache Hadoop Archives ......... ............ SUCCESS [5.851s]
[INFO] Apache Hadoop rumen ......... .............. SUCCESS [18.364s]
[INFO] Apache Hadoop gridmix ......... ............. SUCCESS [14.943s]
[INFO] Apache Hadoop Data Join ......... ........... SUCCESS [9.648s]
[INFO] Apache Hadoop Extras ......... ............. SUCCESS [5.763s]
[INFO] Apache Hadoop Pipes ......... .............. SUCCESS [16.289s]
[INFO] Apache Hadoop Tools Dist ......... .......... SUCCESS [3.261s]
[INFO] Apache Hadoop Tools ......... .............. SUCCESS [0.043s]
[INFO] Apache Hadoop distribution ......... ......... SUCCESS [56.188s]
[INFO] Apache Hadoop Client ......... ............. SUCCESS [10.910s]
[INFO] Apache Hadoop mini-cluster ......... ......... SUCCESS [0.321s]
[INFO]------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]------------------------------------------------------------------------
[INFO] Total time:40:00.444s
[INFO] Finished At:thu Dec 12:42:24 CST 2013
[INFO] Final memory:109m/362m
[INFO]------------------------------------------------------------------------
Well, the compilation is done.
The compiled code is below/usr/local/hadoop-2.2.0-src/hadoop-dist/target, as in.
Compiling Apache Hadoop2.2.0 Source code