Recently consider using Hadoop mapreduce to analyze the data on MongoDB, from the Internet to find some demo, patchwork, finally run a demo, the following process to show you
Environment
- Ubuntu 14.04 64bit
- Hadoop 2.6.4
- MongoDB 2.4.9
- Java 1.8
- Mongo-hadoop-core-1.5.2.jar
- Mongo-java-driver-3.0.4.jar
Download and configuration of Mongo-hadoop-core-1.5.2.jar and Mongo-java-driver-3.0.4.jar
Data
- Sample Data
> DB. inch "_id": ObjectId ("5758db95ab12e17a067fbb6f"), "x": "Hello World""_id": ObjectId (" 5758db95ab12e17a067fbb70 ")," X ":" Nice to meetYou "" _id ": ObjectId (" 5758db95ab12e17a067fbb71 ")," X ":" Good to See You "_id": ObjectId ("5758db95ab12e17a067fbb72"), "X": "World War 2""_id": ObjectId (" 5758db95ab12e17a067fbb73 ")," X ":" See You Again "" _id ": ObjectId (" 5758db95ab12e17a067fbb74 ")," X ":" Bye Bye "}
- The final result
>"_id": "2", "Value": 1"_id": "Again", "value": 1"_id": "Bye", "Value": 2"_id ":" Good "," value ": 1" _id ":" Hello "," value ": 1" _id ":" Meet "," value ": 1" _id ":" Nice "," Value ": 1" _id ":" See "," Value ": 2" _id ":" To "," value ": 2" _id ":" War "," Value ": 1 "_id": "World", "value": 2"_id": "You", "Value": 3}
- The goal is to count the frequency of words appearing in each document, and to use the word as the key, and the frequency as value exists in MongoDB
Hadoop MapReduce Code
- Mapreduce Code
1 ImportJava.util.*; 2 ImportJava.io.*;3 4 Importorg.bson.*;5 6 ImportCom.mongodb.hadoop.MongoInputFormat;7 ImportCom.mongodb.hadoop.MongoOutputFormat;8 9 Importorg.apache.hadoop.conf.Configuration;Ten ImportOrg.apache.hadoop.io.*; One ImportOrg.apache.hadoop.mapreduce.*; A - - Public classWordCount { the Public Static classTokenizermapperextendsMapper<object, Bsonobject, Text, intwritable> { - Private Final StaticIntwritable one =NewIntwritable (1); - PrivateText Word =NewText (); - Public voidmap (Object key, Bsonobject value, context context) + throwsIOException, interruptedexception { -System.out.println ("Key:" +key); +System.out.println ("Value:" +value); AStringTokenizer ITR =NewStringTokenizer (Value.get ("X"). toString ()); at while(Itr.hasmoretokens ()) { - Word.set (Itr.nexttoken ()); - Context.write (Word, one); - } - } - } in Public Static classIntsumreducerextendsReducer<text,intwritable,text,intwritable> { - Privateintwritable result =Newintwritable (); to Public voidReduce (Text key, iterable<intwritable>values, context context) + throwsIOException, interruptedexception { - intsum = 0; the for(intwritable val:values) { *Sum + =val.get (); $ }Panax Notoginseng result.set (sum); - Context.write (key, result); the } + } A Public Static voidMain (string[] args)throwsException { theConfiguration conf =NewConfiguration (); +Conf.set ("Mongo.input.uri", "mongodb://localhost/testmr.in" ); -Conf.set ("Mongo.output.uri", "Mongodb://localhost/testmr.out" ); $@SuppressWarnings ("Deprecation") $Job Job =NewJob (conf, "word count"); -Job.setjarbyclass (WordCount.class); -Job.setmapperclass (Tokenizermapper.class); theJob.setcombinerclass (Intsumreducer.class); -Job.setreducerclass (Intsumreducer.class);WuyiJob.setoutputkeyclass (Text.class); theJob.setoutputvalueclass (intwritable.class); -Job.setinputformatclass (Mongoinputformat.class ); WuJob.setoutputformatclass (Mongooutputformat.class ); -System.exit (Job.waitforcompletion (true) ? 0:1); About } $}
- Compile
- Run
- View Results
$ mongomongodb Shell version:2.4.9Connecting To:test>Use testmr;switched to DB TESTMR> Db.out.Find({}){ "_id":"2","value":1 }{ "_id":"again","value":1 }{ "_id":"Bye","value":2 }{ "_id":"Good","value":1 }{ "_id":"Hello","value":1 }{ "_id":"Meet","value":1 }{ "_id":" Nice","value":1 }{ "_id":" See","value":2 }{ "_id":" to","value":2 }{ "_id":"War","value":1 }{ "_id":" World","value":2 }{ "_id":" You","value":3 }>
The above is a simple example, and then I'm going to use Hadoop mapreduce to deal with more complex data in MongoDB. Please look forward to, if you have any questions, ask in the message area ^_^
References and documentation
- The elephant in the MONGO DB + Hadoop
- http://chenhua-1984.iteye.com/blog/2162576
- Http://api.mongodb.com/java/2.12/com/mongodb/MongoURI.html
- Http://stackoverflow.com/questions/27020075/mongo-hadoop-connector-issue
If the elephant in the MONGO DB +
Analyzing MongoDB Data using Hadoop mapreduce: (1)