The streaming framework allows programs implemented in any program language to be used in hadoopmapreduce to facilitate the migration of existing programs to the Hadoop platform. So it can be said that the scalability of Hadoop is significant. Next we use C + +, PHP, Python language to implement Hadoopwordcount.
Combat one: C + + language implementation WordCount
Code implementation:
1) C + + language implementation WordCount in the mapper, the file is named Mapper.cpp, the following is the detailed code
#include
#include
#include
USINGNAMESPACESTD;
intmain{
Stringkey;
Stringvalue= "1";
while (Cin>>key) {
Cout<}
Return0;
}
2) C + + language implementation WordCount in the reducer, the file is named Reducer.cpp, the following is the detailed code
#include
#include
#include
#include
USINGNAMESPACESTD;
intmain{
Stringkey;
StringValue;
Mapword2count;
Map::iteratorit;
while (Cin>>key) {
cin>>value;
It=word2count.find (key);
if (it!=word2count.end) {
(It->second) + +;
}
else{
Word2count.insert (Make_pair (key,1));
}
}
for (It=word2count.begin;it!=word2count.end;++it) {
cout
Return0;
}
Test run C + + implementation WordCount specific steps
1) Install C + + online
In a Linux environment, if C + + is not installed, we need to install C + + online
Yum-yinstallgcc-c++
2) Compile the C + + file, generate the executable file
We compile the C + + program into an executable file with the following command before we can run
G++-omappermapper.cpp
G++-oreducerreducer.cpp
3) Local Testing
Before the cluster runs the C + + version of WordCount, first to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, the test Run command is as follows:
Catdjt.txt|. /mapper|sort|. /reducer
4) Cluster operation
Switch to the Hadoop installation directory and submit a C + + version of the WordCount job for word counting.
Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-dmapred.reduce.tasks=2
-mapper "./mapper"
-reducer "./reducer"
-filemapper
-filereducer
-input/dajiangtai/djt.txt
-output/dajiangtai/out
If you end up with the results you want, the C + + language is successfully implemented WordCount
Combat Two: PHP language implementation WordCount
Code implementation:
1) PHP language implementation WordCount in the Mapper, the file named wc_mapper.php, the following is the detailed code
#!/usr/bin/php
Error_reporting (E_all^e_notice);
$word 2count=array;
while (($line =fgets (STDIN))!==false) {
$line =trim ($line);
$words =preg_split ('/\w/', $line, 0,preg_split_no_empty);
foreach ($wordsas $word) {
ECHO$WORD,CHR (9), "1", php_eol;
}
}
?>
2) PHP language implementation WordCount in the reducer, the file named wc_reducer.php, the following is the detailed code
#!/usr/bin/php
Error_reporting (E_all^e_notice);
$word 2count=array;
while (($line =fgets (STDIN))!==false) {
$line =trim ($line);
List ($word, $count) =explode (Chr (9), $line);
$count =intval ($count);
$word 2count[$word]+= $count;
}
foreach ($word 2countas$word=> $count) {
ECHO$WORD,CHR (9), $count, Php_eol;
}
?>
Test run PHP implementation WordCount specific steps
1) Install PHP online
In a Linux environment, if you do not have PHP installed, we need to install the PHP environment online
yum-yinstallphp
2) Local Testing
Before the cluster runs PHP version of WordCount, first to run the Linux local test, debug successfully, to ensure that the program runs correctly in the cluster, the test Run command is as follows:
catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php
3) Cluster operation
Switch to the Hadoop installation directory and submit the PHP version of the WordCount job for word counting.
Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-dmapred.reduce.tasks=2
-mapper "phpwc_mapper.php"
-reducer "phpwc_reducer.php"
-filewc_mapper.php
-filewc_reducer.php
-input/dajiangtai/djt.txt
-output/dajiangtai/out
If you end up with the desired results, the PHP language is successfully implemented WordCount
Combat three: Python language implementation WordCount
Code implementation:
1) Python language implementation wordcount in the mapper, the file is named mapper.py, the following is the detailed code
#!/usr/java/hadoop/envpython
Importsys
word2count={}
Forlineinsys.stdin:
Line=line.strip
Words=filter (Lambdaword:word,line.split)
Forwordinwords:
print '%s\t%s '% (word,1)
2) Python language implementation wordcount in the reducer, the file is named reducer.py, the following is the detailed code
#!/usr/java/hadoop/envpython
Fromoperatorimportitemgetter
Importsys
word2count={}
Forlineinsys.stdin:
Line=line.strip
Word,count=line.split
Try
Count=int (count)
Word2count[word]=word2count.get (word,0) +count
Exceptvalueerror:
Pass
sorted_word2count=sorted (word2count.items,key=itemgetter (0))
Forword,countinsorted_word2count:
print '%s\t%s '% (word,count)
Test run Python to implement WordCount steps
1) Install Python online
In a Linux environment, if Python is not installed, we need to install the Python environment online
Yum-yinstallpython27
2) Local Testing
Before the cluster runs the Python version of WordCount, the first thing to do is to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, and the test Run command is as follows:
catdjt.txt|pythonmapper.py|sort|pythonreducer.py
3) Cluster operation
Switch to the Hadoop installation directory and submit the Python version of the WordCount job for word counting.
Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-dmapred.reduce.tasks=2
-mapper "pythonmapper.py"
-reducer "pythonreducer.py"
-filemapper.py
-filereducer.py
-input/dajiangtai/djt.txt
-output/dajiangtai/out
If you end up with the desired results, the Python language is successfully implemented WordCount
Big data Hadoop streaming programming combat C + +, PHP, Python