Big data Hadoop streaming programming combat C + +, PHP, Python

Source: Internet
Author: User
Tags install php php language php online

The streaming framework allows programs implemented in any program language to be used in hadoopmapreduce to facilitate the migration of existing programs to the Hadoop platform. So it can be said that the scalability of Hadoop is significant. Next we use C + +, PHP, Python language to implement Hadoopwordcount.

 Combat one: C + + language implementation WordCount

Code implementation:

1) C + + language implementation WordCount in the mapper, the file is named Mapper.cpp, the following is the detailed code

#include

#include

#include

USINGNAMESPACESTD;

intmain{

Stringkey;

Stringvalue= "1";

while (Cin>>key) {

Cout<}

Return0;

}

2) C + + language implementation WordCount in the reducer, the file is named Reducer.cpp, the following is the detailed code

#include

#include

#include

#include

USINGNAMESPACESTD;

intmain{

Stringkey;

StringValue;

Mapword2count;

Map::iteratorit;

while (Cin>>key) {

cin>>value;

It=word2count.find (key);

if (it!=word2count.end) {

(It->second) + +;

}

else{

Word2count.insert (Make_pair (key,1));

}

}

for (It=word2count.begin;it!=word2count.end;++it) {

cout

Return0;

}

Test run C + + implementation WordCount specific steps

1) Install C + + online

In a Linux environment, if C + + is not installed, we need to install C + + online

Yum-yinstallgcc-c++

2) Compile the C + + file, generate the executable file

We compile the C + + program into an executable file with the following command before we can run

G++-omappermapper.cpp

G++-oreducerreducer.cpp

3) Local Testing

Before the cluster runs the C + + version of WordCount, first to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, the test Run command is as follows:

Catdjt.txt|. /mapper|sort|. /reducer

4) Cluster operation

Switch to the Hadoop installation directory and submit a C + + version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "./mapper"

-reducer "./reducer"

-filemapper

-filereducer

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the results you want, the C + + language is successfully implemented WordCount

  Combat Two: PHP language implementation WordCount

Code implementation:

1) PHP language implementation WordCount in the Mapper, the file named wc_mapper.php, the following is the detailed code

#!/usr/bin/php

Error_reporting (E_all^e_notice);

$word 2count=array;

while (($line =fgets (STDIN))!==false) {

$line =trim ($line);

$words =preg_split ('/\w/', $line, 0,preg_split_no_empty);

foreach ($wordsas $word) {

ECHO$WORD,CHR (9), "1", php_eol;

}

}

?>

2) PHP language implementation WordCount in the reducer, the file named wc_reducer.php, the following is the detailed code

#!/usr/bin/php

Error_reporting (E_all^e_notice);

$word 2count=array;

while (($line =fgets (STDIN))!==false) {

$line =trim ($line);

List ($word, $count) =explode (Chr (9), $line);

$count =intval ($count);

$word 2count[$word]+= $count;

}

foreach ($word 2countas$word=> $count) {

ECHO$WORD,CHR (9), $count, Php_eol;

}

?>

Test run PHP implementation WordCount specific steps

1) Install PHP online

In a Linux environment, if you do not have PHP installed, we need to install the PHP environment online

yum-yinstallphp

2) Local Testing

Before the cluster runs PHP version of WordCount, first to run the Linux local test, debug successfully, to ensure that the program runs correctly in the cluster, the test Run command is as follows:

catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php

3) Cluster operation

Switch to the Hadoop installation directory and submit the PHP version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "phpwc_mapper.php"

-reducer "phpwc_reducer.php"

-filewc_mapper.php

-filewc_reducer.php

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the desired results, the PHP language is successfully implemented WordCount

 Combat three: Python language implementation WordCount

Code implementation:

1) Python language implementation wordcount in the mapper, the file is named mapper.py, the following is the detailed code

#!/usr/java/hadoop/envpython

Importsys

word2count={}

Forlineinsys.stdin:

Line=line.strip

Words=filter (Lambdaword:word,line.split)

Forwordinwords:

print '%s\t%s '% (word,1)

2) Python language implementation wordcount in the reducer, the file is named reducer.py, the following is the detailed code

#!/usr/java/hadoop/envpython

Fromoperatorimportitemgetter

Importsys

word2count={}

Forlineinsys.stdin:

Line=line.strip

Word,count=line.split

Try

Count=int (count)

Word2count[word]=word2count.get (word,0) +count

Exceptvalueerror:

Pass

sorted_word2count=sorted (word2count.items,key=itemgetter (0))

Forword,countinsorted_word2count:

print '%s\t%s '% (word,count)

Test run Python to implement WordCount steps

1) Install Python online

In a Linux environment, if Python is not installed, we need to install the Python environment online

Yum-yinstallpython27

2) Local Testing

Before the cluster runs the Python version of WordCount, the first thing to do is to run the Linux local test, debug successfully, ensure that the program runs correctly in the cluster, and the test Run command is as follows:

catdjt.txt|pythonmapper.py|sort|pythonreducer.py

3) Cluster operation

Switch to the Hadoop installation directory and submit the Python version of the WordCount job for word counting.

Hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-dmapred.reduce.tasks=2

-mapper "pythonmapper.py"

-reducer "pythonreducer.py"

-filemapper.py

-filereducer.py

-input/dajiangtai/djt.txt

-output/dajiangtai/out

If you end up with the desired results, the Python language is successfully implemented WordCount

Big data Hadoop streaming programming combat C + +, PHP, Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.