MongoDB itself can do some simple statistical work, including its built-in JavaScript-based MapReduce framework, as well as the new statistical framework introduced in the MongoDB 2.2 version. In addition, MongoDB also provides an interface for external statistical tools, which is the Mongodb-hadoop data middleware to be mentioned in this paper. The article comes from MongoDB official blog.
Schematic diagram
MongoDB and Hadoop are combined in the same way that the MongoDB is stored as a data source and data results. and the specific calculation process is done in Hadoop.
This set of processes allows us to write mapreduce functions through Python, Ruby, and JavaScript, rather than using Java.
Example
First prepare the Hadoop environment and install HADOOP,MONGODB middleware. The data is then processed in the following manner.
1. Data preparation
Import raw data from the Twitter API into the MongoDB
Curl Https://stream.twitter.com/1/statuses/sample.json-u: |
Mongoimport-d Twitter-c in
2.Map function
Write a map function, saved in file mapper.rb
#!/usr/bin/env Ruby
require ' mongo-hadoop '
mongohadoop.map do |document|
{: _id => document[' user ' [' Time_zone '],: Count => 1}
End
3.Reduce function
Then the reduce function, which is saved in the file REDUCER.RB.
#!/usr/bin/env Ruby
require ' mongo-hadoop '
mongohadoop.reduce do |key, values|
Count = sum = 0
values.each do |value|
Count + + 1
sum + = value[' num '] end
{: _id => key,: Average => sum/count}
End
4. Run the script
Create a run script, write the following, and use the MapReduce method above to process the data obtained in the first step.
Hadoop jar
mongo-hadoop-streaming-assembly*.jar-mapper mapper.rb-reducer
Reducer.rb-inputuri mongodb:// 127.0.0.1/twitter.in-outputuri
Mongodb://127.0.0.1/twitter.out