Hadoop Streaming is a tool for Hadoop that allows users to write MapReduce programs in other languages, and users can perform map/reduce jobs simply by providing mapper and reducer
For information, see the official Hadoop streaming document.
1, the following to achieve wordcount as an example, using C + + to write mapper and reducer
The Mapper.cpp code is as follows:
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string key;
const int value = 1;
while (CIN >> key)
{
cout << key << "" << value << Endl;
}
return 0;
}
The Reducer.cpp code is as follows:
#include <iostream>
#include <string>
#include <map>
using namespace std;
int main ()
{
string key;
int value;
map<string, int> result;
Map<string, int>::iterator it;
while (CIN >> key)
{
cin >> value;
it = Result.find (key);
if (it!= result.end ())
{
(it->second) + +
;
} else
{
Result[key] = value;
}
}
for (it = Result.begin (); it!= result.end (); ++it)
{
cout << it->first << "" << it->se Cond << Endl;
}
return 0;
}
2, compile the executable file mapper and reducer, the command is as follows:
#g + + mapper.cpp-o Mapper
#g + + Reducer.cpp-o Reducer
3. Edit a script runjob.sh as follows:
$HADOOP _home/bin/hadoop jar $HADOOP _home/contrib/streaming/hadoop-streaming-1.1.2.jar \
-mapper mapper \
- REDUCER reducer \
-input/test/input/a.txt \
-output/test/output/test3 \
-file Mapper \-file
Reducer
-input is the location of the job input file in HDFs
-output is the directory where job yields results are stored in HDFs
-FILE Specifies the location of mapper and reducer, and if you do not specify file, using-mapper and-reducer may be wrong
You can also specify some parameters for the MapReduce job using-jobconf, such as the number of maps and reduce, which can be referenced in the Hadoop streaming official documentation
4, execute command #sh runjob.sh can see MapReduce job complete normal
You can see the result is the same as using the Hadoop-example-1.1.2.jar wordcount effect.