標籤:大資料 hadoop 2.7.2 單機版 local (standalone) mode 安裝
對APACHE的開源項目做了一個簡單的分析之後,下一步就是去一窺hadoop的真容了。直接存取HADOOP官網地址,這裡就是學習hadoop的官方渠道了,以下內容摘自官網:
What Is Apache Hadoop? The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The project includes these modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. |
HADOOP對每個模組的描述都簡介到位,無需贅言。目前,HADOOP項目有兩個分支在同時發展,包括2.6.x和2.7.x。當前最新版本為2.7.2,點擊此處可以下載。
點擊官網首頁的learn about,進入學習文檔頁面,也可以在二進位安裝包中開啟 .\hadoop-2.7.2\share\doc\hadoop\index.html檔案。
hadoop部署一共包括三種模式:Local (Standalone) Mode、Pseudo-Distributed Mode、Fully-Distributed Mode,先從最簡單的單機版入手。單機版的特點有三個:運行於本地檔案系統、運行於單個java進程、有利於程式調試。
hadoop支援的作業系統,包括GNU/linux和windows兩類,windows平台就不做探究了,我這邊選用的是redhat linux 6.3版本。對於學慣用的虛擬機器,我一般是把所有能裝的軟體全部勾上,不給自己的學習製造麻煩。
hadoop 2.7.2 單機版安裝過程:
1、確定作業系統版本
[[email protected] ~]# lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.3 (Santiago) Release: 6.3 Codename: Santiago [[email protected] ~]# uname -a Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux |
2、從網上下載確定java版本
hadoop2.7.2要求java版本為1.7以上,當前jdk可用版本為1.8.0_92,點擊這裡下載。
3、檢查linux上已安裝的jdk,並移除已安裝的jdk
[[email protected] ~]# rpm -qa |grep ‘openjdk‘ java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64 java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64 java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64
[[email protected] ~]# rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64 [[email protected] ~]# rpm -e --nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64 [[email protected] ~]# rpm -e --nodeps java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64 |
4、安裝jdk1.8.0_92
[[email protected] local]# rpm -ivh jdk-8u92-linux-x64.rpm Preparing... ########################################### [100%] 1:jdk1.8.0_92 ########################################### [100%] Unpacking JAR files... tools.jar... plugin.jar... javaws.jar... deploy.jar... rt.jar... jsse.jar... charsets.jar... localedata.jar... |
5、修改/etc/profile檔案,在檔案末尾增加 6行資訊
[[email protected] etc]# vi /etc/profile
JAVA_HOME=/usr/java/jdk1.8.0_92 JRE_HOME=/usr/java/jdk1.8.0_92/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH |
6、讓修改後的環境變數生效
[[email protected] etc]# source /etc/profile
|
7、增加hadoop組,增加hadoop使用者,並修改hadoop使用者密碼
[[email protected] ~]# groupadd hadoop [[email protected] ~]# useradd -m -g hadoop hadoop [[email protected] ~]# passwd hadoop |
8、解壓hadoop安裝包到hadoop使用者根目錄
[[email protected] ~]$ tar xxvf hadoop-2.7.2.tar.gz |
9、在hadoop使用者下設定環境變數,在檔案末尾增加下面兩行內容
[[email protected] ~]$ vi .bash_profile
export HADOOP_COMMON_HOME=~/hadoop-2.7.2 export PATH=$PATH:~/hadoop-2.7.2/bin:~/hadoop-2.7.2/sbin |
10、使環境變數修改生效
[[email protected] ~]$ source .bash_profile
|
11、執行測試工作,根據Regex匹配在input目錄下檔案中出現的次數
[[email protected] ~]$ mkdir input [[email protected] ~]$ cp ./hadoop-2.7.2/etc/hadoop/*.xml input [[email protected] ~]$ hadoop jar ./hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output ‘der[a-z.]+‘ 16/03/11 11:11:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/11 11:11:40 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 16/03/11 11:11:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 16/03/11 11:11:40 INFO input.FileInputFormat: Total input paths to process : 8 16/03/11 11:11:40 INFO mapreduce.JobSubmitter: number of splits:8 ...... ...... 16/03/11 11:14:35 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=1159876 FILE: Number of bytes written=2227372 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=8 Map output records=8 Map output bytes=228 Map output materialized bytes=250 Input split bytes=116 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=250 Reduce input records=8 Reduce output records=8 Spilled Records=16 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=59 Total committed heap usage (bytes)=265175040 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=390 File Output Format Counters Bytes Written=192 |
12、查看分析結果,顯示正常內容
[[email protected] ~]$ rm -rf output/ [[email protected] ~]$ cat output/* 2 der. 1 der.zookeeper.path 1 der.zookeeper.kerberos.principal 1 der.zookeeper.kerberos.keytab 1 der.zookeeper.connection.string 1 der.zookeeper.auth.type 1 der.uri 1 der.password |
13、如果再次測試,需要先移除output目錄
[[email protected] ~]$ rm -rf output/
|
本文出自 “沈進群” 部落格,謝絕轉載!
大資料:從入門到XX(三)