The Hadoop wordcount program is a classic hadoop entry-level test program. It mainly counts the number of times that words appear in file1, file2. .. based on a bunch of files file1 and file2.
We test and run this program on a single machine. My testing system is Mac OS.
1 download hadoop package address: http://www.apache.org/dyn/closer.cgi/hadoop/common/
2. install it in any directory. install it in/usr/local and decompress the package.
3. Configure environment variables:
Vi/etc/profile
ADD the following content.
# Export JAVA_HOME = '/usr/libexec/java_home-v 1.7'
Export HADOOP_HOME =/usr/local/hadoop-2.2.0
Export HADOOP_MAPRED_HOME = $ HADOOP_HOME
Export HADOOP_COMMON_HOME = $ HADOOP_HOME
Export HADOOP_HDFS_HOME = $ HADOOP_HOME
Export YARN_HOME = $ HADOOP_HOME
Export HADOOP_OPTS = "-Djava. security. krb5.realm = OX. AC. UK-Djava. security. krb5.kdc = kdc0.ox. ac. uk: kdc1.ox. ac. uk"
Export PATH = $ PATH: $ HADOOP_HOME/bin: $ HADOOP_HOME/sbin
4. Create the namenode and datanode directories. Specify the directory path as needed.
Mkdir-p/urs/local/hadoop/mnode/namenode
Mkdir-p/urs/local/hadoop/mnode/dataname
5. Start configuring the hadoop file (xml ):
A) enter the Hadoop file configuration directory, cd/usr/local/hadoop/etc/hadoop
B) Change hadoop-env.sh
Specify JAVA_HOME: export JAVA_HOME = '/usr/libexec/java_home-v 100'
C) Change yarn-site.xml
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services.mapreduce_shuffle.class </name>
<Value> org. apache. hadoop. mapred. ShuffleHandler </value>
</Property>
D) Change core-site.xml
<Property>
<Name> fs. default. name </name>
<Value> hdfs :/// localhost: 9000 </value>
</Property>
E) Change hdfs-site.xml
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value> file:/usr/local/hadoop/mnode/namenode </value>
</Property>
<Property>
<Name> dfs. datanode. data. dir </name>
<Value> file:/usr/local/hadoop/mnode/datanode </value>
</Property>
F) Change mapred-site.xml
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
6. Format NameNode:
Hadoop namenode-format
7. Start
Start-all.sh
8. View All JAVA processes:
Jps
You can see the following:
Jps 2234
1989 ResourceManager
2023 NodeManager
1856 DataNode
2060 JobHistoryServer
1793 NameNode
2049 SecondaryNameNode
9 now you can view the hadoop running status:
View NameNode: http: // localhost: 50070/
View ResourceManager: http: // localhost: 8088/cluster
10. Create a folder to store raw data (specified at Will ):
Mkdir/Usrs/apple/hadoop/tmp
In this directory, create two files: file1 and file2, edit file1: hello world, edit file2: hello hadoop, and save.
11 copy to HDFS
Hadoop dfs-copyFromLocal/Users/apple/hadoop/tmp // in
(Or: hadoop fs-put/Users/apple/hadoop/tmp/input)
View input Directory: hadoop fs-ls/
12 RUN
First, go to the example Directory: cd/usr/local/hadoop/share/hadoop/mapreduce
Then run: hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount/in/out
13. view the program running result:
Hadoop fs-cat/out/part-r-00000
Hadoop 1
Hello 2
World 1
Hadoop test example WordCount