MongoDB integrated Hadoop for statistical computation

Source: Internet
Author: User
Tags mongodb require

MongoDB itself can do some simple statistical work, including its built-in JavaScript-based MapReduce framework, as well as the new statistical framework introduced in the MongoDB 2.2 version. In addition, MongoDB also provides an interface for external statistical tools, which is the Mongodb-hadoop data middleware to be mentioned in this paper. The article comes from MongoDB official blog.

Schematic diagram

MongoDB and Hadoop are combined in the same way that the MongoDB is stored as a data source and data results. and the specific calculation process is done in Hadoop.

This set of processes allows us to write mapreduce functions through Python, Ruby, and JavaScript, rather than using Java.

Example

First prepare the Hadoop environment and install HADOOP,MONGODB middleware. The data is then processed in the following manner.

1. Data preparation

Import raw data from the Twitter API into the MongoDB

Curl Https://stream.twitter.com/1/statuses/sample.json-u: | 

Mongoimport-d Twitter-c in

2.Map function

Write a map function, saved in file mapper.rb

#!/usr/bin/env Ruby  
require ' mongo-hadoop '
mongohadoop.map do |document|  
{: _id => document[' user ' [' Time_zone '],: Count => 1}  
End

3.Reduce function

Then the reduce function, which is saved in the file REDUCER.RB.

#!/usr/bin/env Ruby  
require ' mongo-hadoop '
mongohadoop.reduce do |key, values|  
Count = sum = 0  
values.each do |value|  
Count + + 1  
sum + = value[' num '] end
{: _id => key,: Average => sum/count}  
End

4. Run the script

Create a run script, write the following, and use the MapReduce method above to process the data obtained in the first step.

Hadoop jar 

mongo-hadoop-streaming-assembly*.jar-mapper mapper.rb-reducer 

Reducer.rb-inputuri mongodb:// 127.0.0.1/twitter.in-outputuri 

Mongodb://127.0.0.1/twitter.out
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.