10gen has just released MongoDB hadoop connector Version 1.0, which is a middleware product used to connect MongoDB and hadoop so that MongoDB can easily use hadoop's distributed computing capabilities.
The main process of MongoDB hadoop connector is to allow hadoop to read raw data from MongoDB. After hadoop computing is complete, the results are imported to MongoDB. The reading and writing of the original data can be different for the same MongoDB. The main purpose is to enable users who use MongoDB to conveniently and directly use hadoop functions.
Currently, MongoDB hadoop connector has been integrated with some components in the hadoop ecosystem, and more comprehensive and convenient integration will be conducted in the future based on feedback. The details are as follows:
- You can use pig to write data to MongoDB.
- You can use the distributed log system flume to import original log data to MongoDB.
- By using hadoop streaming, you can use python to write mapreduce functions.
MongoDB hadoop connector currently supports Versions later than 2.0 (1.8.x is also supported ).
Of course, the project is open-source. The project address is mongo-hadoop.
Message Source: blog.10gen.com