Analyzing MongoDB Data using Hadoop mapreduce: (1)

Source: Internet
Author: User
Tags git clone hadoop mapreduce

Recently consider using Hadoop mapreduce to analyze the data on MongoDB, from the Internet to find some demo, patchwork, finally run a demo, the following process to show you

Environment

    • Ubuntu 14.04 64bit
    • Hadoop 2.6.4
    • MongoDB 2.4.9
    • Java 1.8
    • Mongo-hadoop-core-1.5.2.jar
    • Mongo-java-driver-3.0.4.jar

Download and configuration of Mongo-hadoop-core-1.5.2.jar and Mongo-java-driver-3.0.4.jar

    • Compiling Mongo-hadoop-core-1.5.2.jar
    • $ git clone https://github.com/mongodb/mongo-hadoop$ cd mongo-hadoop$. /gradlew jar
      • Compile time is long, the path that Mongo-hadoop-core-1.5.2.jar exists after successful compilation is Core/build/libs
    • Download Mongo-java-driver-3.0.4.jar
    • http://central.maven.org/maven2/org/mongodb/mongo-java-driver/3.0.4/
      Mongo-java-driver-3.0.4.jar

Data

  • Sample Data
  • > DB. inch  "_id": ObjectId ("5758db95ab12e17a067fbb6f"), "x": "Hello World""_id": ObjectId (" 5758db95ab12e17a067fbb70 ")," X ":" Nice to meetYou "" _id ": ObjectId (" 5758db95ab12e17a067fbb71 ")," X ":" Good to See You "_id": ObjectId ("5758db95ab12e17a067fbb72"), "X": "World War 2""_id": ObjectId (" 5758db95ab12e17a067fbb73 ")," X ":" See You Again "" _id ": ObjectId (" 5758db95ab12e17a067fbb74 ")," X ":" Bye Bye "}
  • The final result
  • >"_id": "2", "Value": 1"_id": "Again", "value": 1"_id": "Bye", "Value": 2"_id ":" Good "," value ": 1" _id ":" Hello "," value ": 1" _id ":" Meet "," value ": 1" _id ":" Nice "," Value ": 1" _id ":" See "," Value ": 2" _id ":" To "," value ": 2" _id ":" War "," Value ": 1 "_id": "World", "value": 2"_id": "You", "Value": 3}
  • The goal is to count the frequency of words appearing in each document, and to use the word as the key, and the frequency as value exists in MongoDB

Hadoop MapReduce Code

  • Mapreduce Code
    1 ImportJava.util.*; 2 ImportJava.io.*;3 4 Importorg.bson.*;5 6 ImportCom.mongodb.hadoop.MongoInputFormat;7 ImportCom.mongodb.hadoop.MongoOutputFormat;8 9 Importorg.apache.hadoop.conf.Configuration;Ten ImportOrg.apache.hadoop.io.*; One ImportOrg.apache.hadoop.mapreduce.*; A  -  -  Public classWordCount { the      Public Static classTokenizermapperextendsMapper<object, Bsonobject, Text, intwritable> { -         Private Final StaticIntwritable one =NewIntwritable (1); -         PrivateText Word =NewText (); -          Public voidmap (Object key, Bsonobject value, context context) +                 throwsIOException, interruptedexception { -System.out.println ("Key:" +key); +System.out.println ("Value:" +value); AStringTokenizer ITR =NewStringTokenizer (Value.get ("X"). toString ()); at              while(Itr.hasmoretokens ()) { - Word.set (Itr.nexttoken ()); - Context.write (Word, one); -             } -         } -     } in      Public Static classIntsumreducerextendsReducer<text,intwritable,text,intwritable> { -         Privateintwritable result =Newintwritable (); to          Public voidReduce (Text key, iterable<intwritable>values, context context) +             throwsIOException, interruptedexception { -             intsum = 0; the              for(intwritable val:values) { *Sum + =val.get (); $             }Panax Notoginseng result.set (sum); - Context.write (key, result); the         } +     } A      Public Static voidMain (string[] args)throwsException { theConfiguration conf =NewConfiguration (); +Conf.set ("Mongo.input.uri", "mongodb://localhost/testmr.in" ); -Conf.set ("Mongo.output.uri", "Mongodb://localhost/testmr.out" ); $@SuppressWarnings ("Deprecation") $Job Job =NewJob (conf, "word count"); -Job.setjarbyclass (WordCount.class); -Job.setmapperclass (Tokenizermapper.class); theJob.setcombinerclass (Intsumreducer.class); -Job.setreducerclass (Intsumreducer.class);WuyiJob.setoutputkeyclass (Text.class); theJob.setoutputvalueclass (intwritable.class); -Job.setinputformatclass (Mongoinputformat.class ); WuJob.setoutputformatclass (Mongooutputformat.class ); -System.exit (Job.waitforcompletion (true) ? 0:1); About     } $}
      • Note: Set Mongo.input.uri and Mongo.output.uri
        1 conf.set ("Mongo.input.uri", "mongodb://localhost/testmr.in" ); 2 conf.set ("Mongo.output.uri", "mongodb://localhost/testmr.out");
  • Compile
    • Compile
      $ Hadoop Com.sun.tools.javac.Main wordcount.java-xlint:deprecation
    • Compiling the jar package
      WC. Jar Wordcount*.class
  • Run
    • Starting Hadoop, running the MapReduce code must start Hadoop
      $ start-all. SH
    • Run the program
    • $ Hadoop jar  WC. Jar WordCount
  • View Results
  • $ mongomongodb Shell version:2.4.9Connecting To:test>Use testmr;switched to DB TESTMR> Db.out.Find({}){ "_id":"2","value":1 }{ "_id":"again","value":1 }{ "_id":"Bye","value":2 }{ "_id":"Good","value":1 }{ "_id":"Hello","value":1 }{ "_id":"Meet","value":1 }{ "_id":" Nice","value":1 }{ "_id":" See","value":2 }{ "_id":" to","value":2 }{ "_id":"War","value":1 }{ "_id":" World","value":2 }{ "_id":" You","value":3 }>

The above is a simple example, and then I'm going to use Hadoop mapreduce to deal with more complex data in MongoDB. Please look forward to, if you have any questions, ask in the message area ^_^

References and documentation

    1. The elephant in the MONGO DB + Hadoop
    2. http://chenhua-1984.iteye.com/blog/2162576
    3. Http://api.mongodb.com/java/2.12/com/mongodb/MongoURI.html
    4. Http://stackoverflow.com/questions/27020075/mongo-hadoop-connector-issue

If the elephant in the MONGO DB +

Analyzing MongoDB Data using Hadoop mapreduce: (1)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.