After a simple analysis of Apache's Open source project, the next step is to get a glimpse of Hadoop's tolerance distortion. Direct access to the Hadoop website, here is the official channel to learn Hadoop, the following excerpt from the official website:
What is Apache Hadoop? The Apache Hadoop Project develops Open-source software for reliable, scalable, distributed computing. The project includes these modules: Hadoop Common: The Common utilities, the other Hadoop modules. Hadoop Distributed File System (HDFS): A Distributed File System, provides high-throughput access to Application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A yarn-based system for parallel processing of large data sets. |
Hadoop has a description of each module in place, needless to be told. Currently, two branches of the Hadoop project are in progress, including 2.6.x and 2.7.x. Currently the latest version is 2.7.2, click here to download.
Click on the homepage of learn about, go to the Study document page, you can also open the. \hadoop-2.7.2\share\doc\hadoop\index.html file in the binary installation package.
The Hadoop deployment consists of three modes: Local (Standalone) mode, pseudo-distributed mode, fully-distributed mode, starting with the simplest stand-alone version. The stand-alone version has three features: runs on the local file system, runs in a single Java process, and facilitates program debugging.
The Windows platform does not explore the OS supported by Hadoop, including Gnu/linux and windows, and I'm using the Redhat Linux 6.3 version. For learning virtual machine, I generally put all the software can be installed on the hook, do not give their own learning to create trouble.
Hadoop 2.7.2 Standalone Installation process:
1. Determine the operating system version
[Email protected] ~]# lsb_release-a LSB Version:: core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64: Printing-4.0-noarch Distributor Id:redhatenterpriseserver description:red Hat Enterprise Linux Server release 6.3 (Santiago) release:6.3 Codename:santiago [Email protected] ~]# uname-a Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Wed June 18:24:36 EDT x86_64 x86_64 x86_64 gnu/linux |
2. Download from the Internet to determine the Java version
hadoop2.7.2 requires a Java version of more than 1.7, the current JDK available version is 1.8.0_92, click here to download.
3. Check the installed JDK on Linux, and remove the installed JDK
[Email protected] ~]# rpm-qa |grep ' openjdk ' Java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64 Java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64 Java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64
[Email protected] ~]# rpm-e--nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64 [Email protected] ~]# rpm-e--nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64 [Email protected] ~]# rpm-e--nodeps java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64 |
4, installing jdk1.8.0_92
[[email protected] local]# rpm-ivh jdk-8u92-linux-x64.rpm preparing... ########################################### [100%] 1:jdk1.8.0_92 ############### ############################ [100%] Unpacking JAR files ... Tools.jar ... Plugin.jar ... Javaws.jar ... Deploy.jar ... Rt.jar ... Jsse.jar ... Charsets.jar ... Localedata.jar ... |
5, modify the/etc/profile file, add 6 lines of information at the end of the file
[Email protected] etc]# Vi/etc/profile
java_home=/usr/java/jdk1.8.0_92 Jre_home=/usr/java/jdk1.8.0_92/jre Path= $PATH: $JAVA _home/bin: $JRE _home/bin Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar: $JRE _home/lib
Export Java_home jre_home PATH CLASSPATH |
6. Make the modified environment variable effective
[Email protected] etc]# Source/etc/profile
|
7. Add Hadoop groups, increase Hadoop users, and modify Hadoop user passwords
[Email protected] ~]# Groupadd Hadoop [Email protected] ~]# useradd-m-G Hadoop Hadoop [Email protected] ~]# passwd Hadoop |
8. Unzip the Hadoop installation package to the Hadoop user root directory
[Email protected] ~]$ tar xxvf hadoop-2.7.2.tar.gz |
9, set the environment variable under the Hadoop user, add the following two lines at the end of the file
[Email protected] ~]$ Vi. bash_profile
Export hadoop_common_home=~/hadoop-2.7.2 Export path= $PATH: ~/hadoop-2.7.2/bin:~/hadoop-2.7.2/sbin |
10. Change the environment variable to take effect
[[email protected] ~]$ source. bash_profile
|
11. Perform the test task, match the number of occurrences in the file in the input directory according to the regular expression
[email protected] ~]$ mkdir input [email protected] ~]$ CP./hadoop-2.7.2/etc/hadoop/*.xml input [[email protected] ~]$ Hadoop jar./hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep Input Output ' der[a-z. + ' 16/03/11 11:11:39 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable 16/03/11 11:11:40 INFO Configuration.deprecation:session.id is deprecated. Instead, use Dfs.metrics.session-id 16/03/11 11:11:40 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, sessionid= 16/03/11 11:11:40 INFO input. Fileinputformat:total input paths to Process:8 16/03/11 11:11:40 INFO MapReduce. Jobsubmitter:number of Splits:8 ...... ...... 16/03/11 11:14:35 INFO MapReduce. Job:counters:30 File System Counters File:number of bytes read=1159876 File:number of bytes written=2227372 File:number of Read operations=0 File:number of Large Read operations=0 File:number of Write Operations=0 Map-reduce Framework Map input Records=8 Map Output records=8 Map Output bytes=228 Map output materialized bytes=250 Input Split bytes=116 Combine input Records=0 Combine Output Records=0 Reduce input groups=2 Reduce Shuffle bytes=250 Reduce input Records=8 Reduce Output records=8 Spilled records=16 Shuffled Maps =1 Failed shuffles=0 Merged Map Outputs=1 GC time Elapsed (ms) =59 Total committed heap usage (bytes) =265175040 Shuffle Errors Bad_id=0 Connection=0 Io_error=0 Wrong_length=0 Wrong_map=0 Wrong_reduce=0 File Input Format Counters Bytes read=390 File Output Format Counters Bytes written=192 |
12, view the analysis results, display normal content
[Email protected] ~]$ RM-RF output/ [email protected] ~]$ cat output/* 2 der. 1 Der.zookeeper.path 1 Der.zookeeper.kerberos.principal 1 Der.zookeeper.kerberos.keytab 1 der.zookeeper.connection.string 1 Der.zookeeper.auth.type 1 Der.uri 1 Der.password |
13. If you test again, you need to remove the output directory first
[Email protected] ~]$ RM-RF output/
|
This article from "Shen Jinqun" blog, declined reprint!
Big data: From Getting Started to XX (iii)