Implement Aggregate functions in MongoDB

Last Update:2018-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The NoSQL we use in this article is MongoDB, which is an open-source document database system with the development language C ++. It provides an efficient document-oriented storage structure.

With the explosive growth of data produced by organizations, from GB to TB, from TB to PB, traditional databases cannot manage such big data through vertical scaling. The cost of traditional data storage and processing methods will increase significantly as the data volume increases. This allows many organizations to find an economic solution, such as NoSQL database, which provides the required data storage and processing capabilities, scalability, and cost efficiency. NoSQL databases do not use SQL as the query language. Such databases have different types, such as document structure storage, key-value structure storage, graph structure, and object database.

The NoSQL we use in this article is MongoDB, which is an open-source document database system with the development language C ++. It provides an efficient document-oriented storage structure and supports processing stored documents through MapReduce programs. It is highly scalable and supports automatic partitioning. Mapreduce can be used for data aggregation. Its data is stored in BSON (Binary JSON) format. Its storage structure supports dynamic schema and supports dynamic query. Unlike rdbms SQL queries, the Mongo query language is represented in JSON.

MongoDB provides an aggregation framework, including common functions such as count, distinct, and group. However, more advanced Aggregate functions, such as sum, average, max, min, variance (variance), and standard deviation (standard deviation), need to be implemented through MapReduce.

This article describes how to use MapReduce to implement common Aggregate functions, such as sum, average, max, min, variance, and standard deviation; typical applications of aggregation include business reports of sales data, such as grouping data in various regions to calculate the total sales volume and financial reports.

We will start from the installation of the software required for the example application in this article.

Software Installation

First, install and set up the MongoDB service on the local machine.

Download MongoDB from the Mongo website and decompress it to a local directory, such as C:> Mongo

Create a data directory in the previous folder. For example, C: \ Mongo \ Data

If the data file is stored elsewhere, you must add the -- dbpath parameter to the command line when using mongod.exe to start MongoDB.

Start the service

MongoDB provides two methods: mongod.exeand later start mongo.exe to start the command line interface, which can be used for management operations. These two executable files are located in the Mongo \ bin directory;

Go to the bin directory of the Mongo installation directory, for example, C:> cd Mongo \ bin.

There are two startup methods:

Mongod.exe-dbpath C: \ Mongo \ data or mongod.exe-config mongodb. config mongodb. config is the configuration file under the Mongo \ bin directory. You must specify the location of the Data Directory (for example, dbpath = C: \ Mongo \ Data) in this configuration file.

Connect to MongoDB. At this step, the mongo background service has been started. You can view it through: 27017. After MongoDB starts running, let's look at its aggregate functions.

Implement Aggregate functions

In relational databases, we can execute SQL statements containing predefined Aggregate functions on numeric fields, such as SUM (), COUNT (), MAX (), and MIN (). However, in MongoDB, The MapReduce function is required to implement aggregation and batch processing. It is similar to the group by clause used in SQL to implement aggregation. The next section describes the SQL-based aggregation in relational databases and the corresponding aggregation through MapReduce provided by MongoDB.

To discuss this topic, we consider the Sales table shown below, which is presented in the anti-paradigm form of MongoDB.

Sales table

Column name

Data Type

OrderId

INTEGER

OrderDate

STRING

Quantity

INTEGER

SalesAmt

DOUBLE

Profit

DOUBLE

CustomerName

STRING

City

STRING

State

STRING

ZipCode

STRING

Region

STRING

ProductId

INTEGER

ProductCategory

STRING

ProductSubCategory

STRING

ProductName

STRING

ShipDate

STRING

Implementation based on SQL and MapReduce

We provide a sample set for queries. These queries use aggregate functions, filter conditions, grouping clauses, and their equivalent MapReduce implementation. That is, MongoDB implements the equivalent group by method in SQL. It is very useful to perform aggregation operations on documents stored in MongoDB. One limitation of this method is Aggregate functions (such as SUM, AVG, MIN, MAX) you must use mapper and CER functions to customize the implementation.

MongoDB does not support UDFs. However, it allows you to use the db. system. js. save command to create and save JavaScript Functions. JavaScript functions can be reused in MapReduce. The following table shows the implementation of some common Aggregate functions. Later, we will discuss the use of these functions in MapReduce tasks.

Aggregate functions

Javascript Functions

SUM

Db. system. js. save ({_ id: "Sum", value: function (key, values) {var total = 0; for (var I = 0; I <values. length; I ++) total + = values [I]; return total ;}});

AVERAGE

Db. system. js. save ({_ id: "Avg", value: function (key, values) {var total = Sum (key, values); var mean = total/values. length; return mean ;}});

MAX

Db. system. js. save ({_ id: "Max", value: function (key, values) {var maxValue = values [0]; for (var I = 1; I

MIN

Db. system. js. save ({_ id: "Min", value: function (key, values) {var minValue = values [0]; for (var I = 1; I

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Implement Aggregate functions in MongoDB

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Implement Aggregate functions in MongoDB

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support