MongoDB: Map-Reduce, mongodbmap-reduce
Map-reduce is a data processing program (paradigm) that considers large data to obtain useful aggregation results. For map-reduce operations, MongoDB provides mapreduce commands.
Consider the following map-reduce operation:
In this map-reduce operation, MongoDB applies the map operation for each input document (for example, a document that meets the query conditions in the set. map function Input key-value pair. for keys with multiple values, MongoDB uses the reduce stage to collect and compress the aggregated data. mongo-DB then stores the result in a set. the Reduce function output can be passed to a finalize function to further compress or process the aggregation results.
In MongoDB, All map-reduce functions are javascript code and run in mongod process. the Map-reduce operation accepts a set of documents as input, and can execute any sorting and restrictions before the map stage. mapreduce can return the result of the map-reduce operation in the form of a document, or may write the result to the set. The input and output collections may be shared.
Note:
For most aggregation operations, the aggregation pipeline provides better performance and more consistent interfaces. However, the map-reduce operation provides the flexibility not available in the aggregation pipeline.
Map-Reduce JavaScript Functions
In MongoDB, map-reduce operations use custom functions to map, or associate keys and values. If a key has multiple values corresponding to it, the reduce operation "removes" the value of the key to a single object (the operation causes CES the values for the key to a single object ).
Custom javascript Functions provide flexibility for map-reduce. For example, when processing a document, the map function generates more than one key-value pair matching or no key-value pair matching. The Map-reduce function can also use a custom javascript function to modify the result at the end of map and reduce functions.
Map-Reduce action
In MongoDB, the map-reduce function can write results or return results to the Set Online. If you write the map-reduce output to a set, you can execute subsequent map-reduce operations on the same input set, and merge and replace the operations, or cut the previous results.
When the result of the map-reduce operation is returned online, the result Document must be within the BSON Document Size limit, which is currently 16 Mb.
MongoDB supports map-reduce operations on shared sets and outputs results to shared sets.
Who knows about mongodb's mapreduce?
Map: it can be understood as the data to be filled. In SQL, it is like the portion of the where condition to be filtered;
Reduce: it can be understood as the field to be displayed;
Because mapreduce is very difficult for beginners to understand. We recommend that you start with the simple group method;
In addition, the performance of MapReduce is very low. Unless the background statistics are performed, do not use MapReduce or query it as the front-end data access method.
Where is mongoDB applicable?
Mongo is applicable to the following scenarios:
◆ Website data: Mongo is ideal for real-time insertion, update, and query, as well as the replication and high scalability required for real-time website data storage.
◆ Cache: because of its high performance, Mongo is also suitable for serving as the cache layer of information infrastructure. After the system is restarted, the persistent cache layer established by Mongo can avoid data source overload at the lower layer.
◆ Large-sized and low-value data: traditional relational databases may be expensive to store some data. Previously, programmers often choose traditional files for storage.
◆ High scalability: Mongo is ideal for databases consisting of dozens or hundreds of servers. The Mongo roadmap contains built-in support for the MapReduce engine.
◆ Used for storage of objects and JSON data: The BSON data format of Mongo is very suitable for storing and querying document-based data.
Naturally, there are some restrictions on the use of MongoDB, for example, it is not suitable:
◆ Highly transactional systems: such as banking or accounting systems. Traditional relational databases are still more suitable for applications that require a large number of atomic complex transactions.
◆ Traditional business intelligence applications: BI databases for specific problems will produce highly optimized query methods. For such applications, data warehouse may be a more appropriate choice.