Implement Aggregate functions in MongoDB

Source: Internet
Author: User
The NoSQL we use in this article is MongoDB, which is an open-source document database system with the development language C ++. It provides an efficient document-oriented storage structure.

The NoSQL we use in this article is MongoDB, which is an open-source document database system with the development language C ++. It provides an efficient document-oriented storage structure.

With the explosive growth of data produced by organizations, from GB to TB, from TB to PB, traditional databases cannot manage such big data through vertical scaling. The cost of traditional data storage and processing methods will increase significantly as the data volume increases. This allows many organizations to find an economic solution, such as NoSQL database, which provides the required data storage and processing capabilities, scalability, and cost efficiency. NoSQL databases do not use SQL as the query language. Such databases have different types, such as document structure storage, key-value structure storage, graph structure, and object database.

The NoSQL we use in this article is MongoDB, which is an open-source document database system with the development language C ++. It provides an efficient document-oriented storage structure and supports processing stored documents through MapReduce programs. It is highly scalable and supports automatic partitioning. Mapreduce can be used for data aggregation. Its data is stored in BSON (Binary JSON) format. Its storage structure supports dynamic schema and supports dynamic query. Unlike rdbms SQL queries, the Mongo query language is represented in JSON.

MongoDB provides an aggregation framework, including common functions such as count, distinct, and group. However, more advanced Aggregate functions, such as sum, average, max, min, variance (variance), and standard deviation (standard deviation), need to be implemented through MapReduce.

This article describes how to use MapReduce to implement common Aggregate functions, such as sum, average, max, min, variance, and standard deviation; typical applications of aggregation include business reports of sales data, such as grouping data in various regions to calculate the total sales volume and financial reports.

We will start from the installation of the software required for the example application in this article.

Software Installation

First, install and set up the MongoDB service on the local machine.

  • Download MongoDB from the Mongo website and decompress it to a local directory, such as C:> Mongo
  • Create a data directory in the previous folder. For example, C: \ Mongo \ Data
  • If the data file is stored elsewhere, you must add the -- dbpath parameter to the command line when using mongod.exe to start MongoDB.
  • Start the service
  • MongoDB provides two methods: mongod.exeand later start mongo.exe to start the command line interface, which can be used for management operations. These two executable files are located in the Mongo \ bin directory;
  • Go to the bin directory of the Mongo installation directory, for example, C:> cd Mongo \ bin.
  • There are two startup methods:

    Mongod.exe-dbpath C: \ Mongo \ data or mongod.exe-config mongodb. config mongodb. config is the configuration file under the Mongo \ bin directory. You must specify the location of the Data Directory (for example, dbpath = C: \ Mongo \ Data) in this configuration file.
  • Connect to MongoDB. At this step, the mongo background service has been started. You can view it through: 27017. After MongoDB starts running, let's look at its aggregate functions.

  • Implement Aggregate functions

    In relational databases, we can execute SQL statements containing predefined Aggregate functions on numeric fields, such as SUM (), COUNT (), MAX (), and MIN (). However, in MongoDB, The MapReduce function is required to implement aggregation and batch processing. It is similar to the group by clause used in SQL to implement aggregation. The next section describes the SQL-based aggregation in relational databases and the corresponding aggregation through MapReduce provided by MongoDB.

    To discuss this topic, we consider the Sales table shown below, which is presented in the anti-paradigm form of MongoDB.

    Sales table

    #

    Column name

    Data Type

    1

    OrderId

    INTEGER

    2

    OrderDate

    STRING

    3

    Quantity

    INTEGER

    4

    SalesAmt

    DOUBLE

    5

    Profit

    DOUBLE

    6

    CustomerName

    STRING

    7

    City

    STRING

    8

    State

    STRING

    9

    ZipCode

    STRING

    10

    Region

    STRING

    11

    ProductId

    INTEGER

    12

    ProductCategory

    STRING

    13

    ProductSubCategory

    STRING

    14

    ProductName

    STRING

    15

    ShipDate

    STRING

    Implementation based on SQL and MapReduce

    We provide a sample set for queries. These queries use aggregate functions, filter conditions, grouping clauses, and their equivalent MapReduce implementation. That is, MongoDB implements the equivalent group by method in SQL. It is very useful to perform aggregation operations on documents stored in MongoDB. One limitation of this method is Aggregate functions (such as SUM, AVG, MIN, MAX) you must use mapper and CER functions to customize the implementation.

    MongoDB does not support UDFs. However, it allows you to use the db. system. js. save command to create and save JavaScript Functions. JavaScript functions can be reused in MapReduce. The following table shows the implementation of some common Aggregate functions. Later, we will discuss the use of these functions in MapReduce tasks.

    Aggregate functions

    Javascript Functions

    SUM

    Db. system. js. save ({_ id: "Sum", value: function (key, values) {var total = 0; for (var I = 0; I <values. length; I ++) total + = values [I]; return total ;}});

    AVERAGE

    Db. system. js. save ({_ id: "Avg", value: function (key, values) {var total = Sum (key, values); var mean = total/values. length; return mean ;}});

    MAX

    Db. system. js. save ({_ id: "Max", value: function (key, values) {var maxValue = values [0]; for (var I = 1; I

    MIN

    Db. system. js. save ({_ id: "Min", value: function (key, values) {var minValue = values [0]; for (var I = 1; I

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.