Big data Hadoop streaming programming combat C + +, PHP, Python

Last Update:2018-04-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The streaming framework allows programs implemented in any program language to be used in hadoopmapreduce to facilitate the migration of existing programs to the Hadoop platform. So it can be said that the scalability of Hadoop is significant. Next we use C + +, PHP, Python language to implement Hadoopwordcount.

　Combat one: C + + language implementation WordCount

Code implementation:

1) C + + language implementation WordCount in the mapper, the file is named Mapper.cpp, the following is the detailed code

#include

USINGNAMESPACESTD;

intmain{

Stringkey;

Stringvalue= "1";

while (Cin>>key) {

Cout<}

Return0;

}

2) C + + language implementation WordCount in the reducer, the file is named Reducer.cpp, the following is the detailed code

#include

USINGNAMESPACESTD;

intmain{

Stringkey;

StringValue;

Mapword2count;

Map::iteratorit;

while (Cin>>key) {

cin>>value;

It=word2count.find (key);

if (it!=word2count.end) {

(It->second) + +;

}

else{

Word2count.insert (Make_pair (key,1));

}

for (It=word2count.begin;it!=word2count.end;++it) {

cout

Return0;

}

Test run C + + implementation WordCount specific steps

1) Install C + + online

In a Linux environment, if C + + is not installed, we need to install C + + online

Yum-yinstallgcc-c++

2) Compile the C + + file, generate the executable file

We compile the C + + program into an executable file with the following command before we can run

G++-omappermapper.cpp

G++-oreducerreducer.cpp

3) Local Testing

Before the cluster runs the C + + version of WordCount, first to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, the test Run command is as follows:

Catdjt.txt|. /mapper|sort|. /reducer

4) Cluster operation

Switch to the Hadoop installation directory and submit a C + + version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "./mapper"

-reducer "./reducer"

-filemapper

-filereducer

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the results you want, the C + + language is successfully implemented WordCount

　　Combat Two: PHP language implementation WordCount

Code implementation:

1) PHP language implementation WordCount in the Mapper, the file named wc_mapper.php, the following is the detailed code

#!/usr/bin/php

Error_reporting (E_all^e_notice);

$word 2count=array;

while (($line =fgets (STDIN))!==false) {

$line =trim ($line);

$words =preg_split ('/\w/', $line, 0,preg_split_no_empty);

foreach ($wordsas $word) {

ECHO$WORD,CHR (9), "1", php_eol;

}

2) PHP language implementation WordCount in the reducer, the file named wc_reducer.php, the following is the detailed code

#!/usr/bin/php

Error_reporting (E_all^e_notice);

$word 2count=array;

while (($line =fgets (STDIN))!==false) {

$line =trim ($line);

List ($word, $count) =explode (Chr (9), $line);

$count =intval ($count);

$word 2count[$word]+= $count;

}

foreach ($word 2countas$word=> $count) {

ECHO$WORD,CHR (9), $count, Php_eol;

}

Test run PHP implementation WordCount specific steps

1) Install PHP online

In a Linux environment, if you do not have PHP installed, we need to install the PHP environment online

yum-yinstallphp

2) Local Testing

Before the cluster runs PHP version of WordCount, first to run the Linux local test, debug successfully, to ensure that the program runs correctly in the cluster, the test Run command is as follows:

catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php

3) Cluster operation

Switch to the Hadoop installation directory and submit the PHP version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "phpwc_mapper.php"

-reducer "phpwc_reducer.php"

-filewc_mapper.php

-filewc_reducer.php

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the desired results, the PHP language is successfully implemented WordCount

　Combat three: Python language implementation WordCount

Code implementation:

1) Python language implementation wordcount in the mapper, the file is named mapper.py, the following is the detailed code

#!/usr/java/hadoop/envpython

Importsys

word2count={}

Forlineinsys.stdin:

Line=line.strip

Words=filter (Lambdaword:word,line.split)

Forwordinwords:

print '%s\t%s '% (word,1)

2) Python language implementation wordcount in the reducer, the file is named reducer.py, the following is the detailed code

#!/usr/java/hadoop/envpython

Fromoperatorimportitemgetter

Importsys

word2count={}

Forlineinsys.stdin:

Line=line.strip

Word,count=line.split

Try

Count=int (count)

Word2count[word]=word2count.get (word,0) +count

Exceptvalueerror:

Pass

sorted_word2count=sorted (word2count.items,key=itemgetter (0))

Forword,countinsorted_word2count:

print '%s\t%s '% (word,count)

Test run Python to implement WordCount steps

1) Install Python online

In a Linux environment, if Python is not installed, we need to install the Python environment online

Yum-yinstallpython27

2) Local Testing

Before the cluster runs the Python version of WordCount, the first thing to do is to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, and the test Run command is as follows:

catdjt.txt|pythonmapper.py|sort|pythonreducer.py

3) Cluster operation

Switch to the Hadoop installation directory and submit the Python version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "pythonmapper.py"

-reducer "pythonreducer.py"

-filemapper.py

-filereducer.py

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the desired results, the Python language is successfully implemented WordCount

Big data Hadoop streaming programming combat C + +, PHP, Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More