Using Hadoop streaming to write MapReduce programs in C + +

Source: Internet
Author: User

Hadoop Streaming is a tool for Hadoop that allows users to write MapReduce programs in other languages, and users can perform map/reduce jobs simply by providing mapper and reducer

For information, see the official Hadoop streaming document.

1, the following to achieve wordcount as an example, using C + + to write mapper and reducer

The Mapper.cpp code is as follows:

#include <iostream>
#include <string>
using namespace std;

int main ()
{
	string key;
	const int value = 1;
	
	while (CIN >> key)
	{
		cout << key << "" << value << Endl;
	}
	
	return 0;
}
The Reducer.cpp code is as follows:

#include <iostream>
#include <string>
#include <map>
using namespace std;

int main ()
{
	string key;
	int value;
	map<string, int> result;
	Map<string, int>::iterator it;
	
	while (CIN >> key)
	{
		cin >> value;
		it = Result.find (key);
		if (it!= result.end ())
		{
			(it->second) + +
		;
		} else
		{
			Result[key] = value;
		}
	}
	
	for (it = Result.begin (); it!= result.end (); ++it)
	{
		cout << it->first << "" << it->se Cond << Endl;
	}
	
	return 0;
}

2, compile the executable file mapper and reducer, the command is as follows:

#g + + mapper.cpp-o Mapper
#g + + Reducer.cpp-o Reducer

3. Edit a script runjob.sh as follows:

$HADOOP _home/bin/hadoop jar $HADOOP _home/contrib/streaming/hadoop-streaming-1.1.2.jar \
-mapper mapper \
- REDUCER reducer \
-input/test/input/a.txt \
-output/test/output/test3 \
-file Mapper \-file
Reducer
-input is the location of the job input file in HDFs

-output is the directory where job yields results are stored in HDFs

-FILE Specifies the location of mapper and reducer, and if you do not specify file, using-mapper and-reducer may be wrong

You can also specify some parameters for the MapReduce job using-jobconf, such as the number of maps and reduce, which can be referenced in the Hadoop streaming official documentation

4, execute command #sh runjob.sh can see MapReduce job complete normal

You can see the result is the same as using the Hadoop-example-1.1.2.jar wordcount effect.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.