MongoDB and MySQL insert performance test "Go"

Source: Internet
Author: User
Tags mongodb query mongodb version mysql insert

A brief introduction to 1.1 MongoDB

In today's database market, MySQL is undoubtedly a place to occupy. As an open-source relational database , MySQL is widely used in the background of major web sites, and assumes the important role of information storage. In 2009, Oracle acquired Sun, and MySQL became Oracle's products.

MongoDB is a file-oriented database management system, belonging to a database called NoSQL , 10gen Company's Open source products, its internal data storage methods and traditional relational data are very different.

NoSQL's full name is not only SQL, but also can understand the non-relational database, is a new type of revolutionary database design, but it is not designed to replace the traditional relational database, they represent different database design ideas.

Although the company behind MongoDB is not as powerful as Oracle, it is now being used in all walks of life. MongoDB is currently the most widely used NoSQL database product.

Storage features of 1.2 MongoDB

In a traditional relational database , data is stored in a form as a medium, with each form having vertical columns and horizontal rows. In MySQL, for example, if the user wants to index the student's number, the information stored in the database will be as shown in the name and address information:

Id

Name

Address

10001

Alice

A1

10002

Bob

A2

10003

Cara

A3

10004

David

A4

10005

Eve

A5

Indicates that there are 5 entries in the database, which record the names and address information of students with the 10001–10005 number.

If the user wants to re-deposit the appropriate information into the MONGODB database, the information in the database should look like this:

"_id": "10001",

"Name": "Alice",

"Address": "A1",

"_id": "10002",

"Name": "Bob",

"Address": "A2",

"_id": "10003",

"Name": "Cara",

"Address": "A3",

"_id": "10004",

"Name": "David",

"Address": "A4",

"_id": "10005",

"Name": "Eve",

"Address": "A5",

Thus, compared with MYSQL,MONGODB, the data can be stored in a visual document way. It is much like the JSON format defined in JavaScript, but when the data is stored, the MONGODB database adds serialized operations to the document, and finally the disk is actually a format called BSON , the Binary-json.

You may not have a particularly intuitive sense of the difference between data stores in two databases. Let's look at another set of data that MongoDB stores:

"_id": "10001",

"Score": {

"Maths": 71,

"中文版": 62,

}

"_id": "10002",

"Score": {

"Maths": 81,

"Chemistry": 74,

"Sport": {

"Basketball": 67,

"Badminton": 76,

},

}

The above data indicate the course score information for student number 10001 and 100,022 students. If you want to put the same data into MySQL data, it will take a lot of effort. In a relational database, the number of columns is generally fixed beforehand, and the columns can be identified by column names. If you want to deposit the above data, we can take the following methods:

Id

Maths

中文版

Chemistry

Basketball

Badminton

10001

71

62

Null

Null

Null

10002

81

Null

74

67

76

or the following:

Id

Course

Score

Course

Score

Course

Score

Course

Score

10001

Maths

71

中文版

62

Null

Null

Null

Null

10002

Maths

81

Chemistry

74

Basketball

67

Badminton

76

Both of the above two storage methods, no matter which choice, can not be very intuitive to present two students of the various subjects and the relationship between the various disciplines, the use of storage space is not satisfactory, and scalability is not good enough. Of course, to solve these problems, we can also use multiple forms to store students ' scores, but this also complicates the content in the database.

1.2 MongoDB Application Scenario

On the other hand, for developers, this distinctive feature of MongoDB stands out when the specific format of the data is not clearly defined by the business requirements or the initial phase of the project. Compared with the traditional relational database, it is very easy to be extended , which also brings great convenience to write code.

However, MongoDB's support for transactional relationships between data is weak , and MongoDB is not suitable for this type of application if the business is demanding more.

In addition, MongoDB appears late, but also has some very distinct features. Like what:

1. It comes with a Distributed file system called Girdfs , which provides a great convenience for MONGODB deployment. And like MySQL this relatively early database, although there are many different sub-table deployment of the scheme, but this is not as direct as the official support of MongoDB is convenient.

2. In addition, MongoDB internal also built a support for the Map-reduce computing Framework , although this support from a functional perspective is relatively simple, equivalent to the GroupBy function of MySQL extension version, but also for the statistics of the data brought convenience.

3. MongoDB loads the data in the database into memory as a file map after it is started. If the memory resources are quite rich, this will greatly improve the query speed of the database, after all, memory I/O efficiency is much higher than disk.

However, as a fresh transaction, MongoDB also has a lot of shortcomings. It provides developers with the convenience of the case, but in the operation is faced with a number of problems , such as:

1. there is no mature operational experience than MYSQL,MONGODB, which needs to be explored continuously.

2. data storage in MongoDB is quite arbitrary and does not have MySQL defined at the beginning. For OPS, they may not know the data format of the data inside the database, which also brings trouble to the operation of the database.

2. Test Purpose

MongoDB and MySQL as two different types of database, when stored in more and more records, the efficiency of its insertion will be affected, is the object of this experiment.

Here, we set the data storage size of this experiment database at 100 million .

3. Test conditions

Machine configuration: Cpu:intel (R) Xeon (r) CPU e5-2620 @ 2.00GHz

Memory: 65954040 KB

(Keywords: 12 core cpu,64g memory, give me how good)

Operating system: Linux version 2.6.32_1-8-0-0 (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC))

MongoDB version: 2.2.3, no optimization configuration, stand-alone test

MySQL version: 5.1.49, no optimization configuration, stand-alone test

Test language: Python 2.7.1

Database Interface Driver:

Mongodb:pymongo 2.1.1

Mysql:mysqldb 1.2.3

4. Concept Popularization

In the data stored in the database, there is a special key value called the primary key , which is used to uniquely identify a record in the table. In other words, a table cannot have more than one primary key, and the primary key cannot be a null value.

There is a definition of the primary key for both MongoDB and MySQL.

For MongoDB, the primary key is called "_id", and MongoDB automatically generates a randomly assigned value for the data when it is generated, if the user does not actively assign it a primary key.

In MySQL, the designation of the primary key is defined by specifying PRIMARY key when MySQL inserts the data. When a primary key is not specified, the other tool, the index, is the equivalent of the function that replaces the primary key. Indexes can be empty or duplicate, and another index that does not allow duplication is called a unique index. If neither a primary key nor an index is specified, MySQL automatically creates one for the data.

5. Test method

1. Develop a field template for a database table entry that inserts data into the database as a baseline.

2. Automatically generate 100 million data to be tested in memory. The format of the data is not listed here, it contains about 45 fields, one of the key fields is the MD5 value of 1–100,000,000 , they are different from each other, the data of the other fields are written dead. The size of each piece of data is about 1 K.

Remember that the method of this test is to generate 100 million data in memory before performing the insert operation. Fortunately, the memory of the test machine is large enough to save so much data.

3. Insert data into the database in the following four modes, and when each 1000 data is inserted, the time of the moment is written to a fixed file:

A) specify _ID as the MD5 value of 1–100,000,000 in MongoDB and insert the data;

b) The _id value is not specified in MongoDB and the MD5 value of 1–100,000,000 is treated as a normal field insert;

c) in MySQL with 1–100,000,000 MD5 value of primary KEY, the data is inserted;

d) primary KEY is not specifiedin MySQL and the MD5 value of 1–100,000,000 is considered a normal field insert.

4. Analyze the insertion performance of MySQL and MongoDB based on the generated time file records.

5. Further, on the basis of the above four kinds of databases, test the reading performance of the database separately.

6. Test process

After you've written your test script, run it and get up and look at the results in one sleep. The process was long, but the result was gratifying :D

7. Test Results 7.1 Average insertion time per piece of data

Let's take a picture first, and feel it intuitively:

The data on the graph is the average time required for each insertion of 1000 data, in seconds. Remember, it's every 1000, not every piece of data.

Summarize:

1. Average insertion rate of the database:MongoDB does not specify _id Insert > MySQL do not specify primary key Insert > MySQL specify PRIMARY key Insert > MongoDB specify _id Insert .

2. MongoDB has a large difference in speed when specifying _id and not specifying _id inserts, while MySQL is much smaller.

Analysis:

1. When specifying a _id or primary key, both databases will process the index value at insert and look for the same key value in the database, which slows the insertion rate.

2. In MongoDB, the specified index insertion is much slower than not specified, because the _id value of each data in MongoDB is unique . When inserting data without specifying _ID, its _id is generated by the system's automatic calculation. MongoDB ensures that the generated _id are unique through computer eigenvalues, time, process IDs, and random numbers. when specifying _id inserts, MongoDB will need to check that this _id is not available for every piece of data, and when the number of data bars in the database is too many, the query overhead of this step slows down the entire database insertion speed.

3. MongoDB makes full use of system memory as a cache , which is a very good feature. The memory of our test machine is 64G, and when it is plugged in, MongoDB will try to persist the data to the hard disk after the memory is not written in the data. This is also the reason why MongoDB's efficiency is far ahead when not specifying _id insertion. However, when specifying _id inserts, when the amount of data in a large memory is not enough, MongoDB needs to read the disk information into memory to check the weight, so that its insertion efficiency is slow.

4. MySQL is a very stable database, no matter whether you specify a primary key or do not specify a primary key insertion, its efficiency is not too much.

7.2 Insert Stability Analysis

Insert stability refers to the rate at which each insertion of a certain amount of data is inserted as the amount of data increases.

In this test, we set the scale of this indicator at 10w, that is, the data displayed in each insert 10w data, in this period of time each second can insert how many pieces of data .

Show four images first:

1. MONGODB specifies _id insert:

2. MongoDB does not specify _id insert:

3. mysql specifies primary key insertion:

4. mysql does not specify primary key insertion:

Summarize:

1. The overall insertion speed is similar to the previous statistic:MongoDB does not specify _id Insert > MySQL does not specify primary key Insert > MySQL specify PRIMARY key Insert > MongoDB specify _id Insert .

2. It can be seen that when the primary key is inserted into the data, MySQL and MongoDB at different levels of data, the data inserted every second every time there will be a fluctuation, in the chart shows a regular Burr phenomenon . When inserting data is not specified, in most cases the insertion rate is averaged, but as the data in the database increases, the efficiency of the insertion decreases instantaneously in a certain period of time and then becomes stable.

3. On the whole, the rate fluctuation of MongoDB is more serious than that of MySQL and the variance is larger.

4. When MongoDB is inserted in the specified _id, the insertion efficiency decreases significantly when the inserted data becomes much more. In the other three types of insert tests, the rate at which they were inserted, from start to finish, was fixed to a standard in most cases.

Analysis:

1. Glitch is because, when there is too much data to be inserted,MongoDB needs to write the data in memory into the hard disk, and MySQL needs to be re-divided into tables . These operations occur automatically every time the data in the database reaches a certain magnitude, so there is a noticeable glitch at intervals.

2. MongoDB is still a new thing after all, its stability has not been applied for many years of MySQL excellent.

3. MongoDB 's performance drop is severe when it is inserted in the specified _id .

7.3 Simple test of MySQL and MongoDB read performance

This is an additional test, and it is not completely tested, but it is a good indication of some of the problems.

Test method:

First in the 1–100, The 100 million numbers, respectively, randomly take 1w, 5w, 10w, 20w, 50w a different number, and then calculate its MD5 value , and save.

As for why the highest only to 50w this size, this is because I randomly generated 100w a different number of times, the script ran all night did not run out, it is estimated that I generated the algorithm is too bad to write. I don't want to do it again, let's take 50w as the limit.

In the above two databases with primary key inserted, the MD5 source generated by the above step is queried for input. Similarly, 1000 data per query is written to the current system time in the log file.

Test results:

The horizontal axis of the following three graphs is the time required for each query of 1000 data, in s; ordinate is the size of the query, divided into 1w, 5w,10w, 20w, 50w five levels.

This picture is a detailed comparison, can you see the difference between MySQL and MongoDB ...

Summarize:

1. When the size of the data read is small,MongoDB query speed is really a ride dust , throw away MySQL good far away.

2. When the data volume of the query is increasing, the query speed of MySQL is declining steadily, while the query speed of MongoDB is somewhat ups and downs.

Analysis:

1. If MySQL is not optimized for query, its query speed should not be compared with MongoDB. MongoDB can take full advantage of the system's memory resources , our test machine memory is 64GB, the larger the memory MongoDB query faster, after all, disk and memory I/O efficiency is not a magnitude.

2. The data of the query in this experiment is also randomly generated, so the probability that the data to be queried exists in the memory cache of MongoDB is very small. At query time, MongoDB needs to interact with the disk multiple times to find the data in memory, so its query rate depends on the number of times it interacts . There is a possibility that, although the number of data to be queried is large, this randomly generated data is taken out of the disk by MongoDB in less time. As a result, the average speed of its queries is faster. Thus, MongoDB's query speed fluctuation is also in a reasonable range.

3. There is no doubt about the stability of MySQL.

8. Test Summary 8.1 Test conclusion

1. Compared to MySQL, theMongoDB database is more suitable for those task models that read the job more heavily . MongoDB can take full advantage of the machine's memory resources. MongoDB's query efficiency can be much faster if the machine's memory resources are plentiful.

2. When inserting data with "_id", the efficiency of MongoDB insertion is not high. If you want to take full advantage of MongoDB performance, it is recommended to adopt the Insert method without "_id", and then index the related fields to query .

8.2 Testing issues that require further attention

The read test for MongoDB is poorly thought out, although this is just an extra test. In this test, it is necessary to randomly generate a large number of data to be tested, but it is not necessary to generate a large number of disparate data. It is this, the size of my reading test is limited to 50w, no further analysis.

Advantages of 8.3 MongoDB

1. MongoDB is suitable for requirements models that are ambiguous in database format or that are frequently changed in database formats, and are very friendly to developers .

2. MongoDB official comes with a distributed file system that can be easily deployed on the server cluster . MongoDB has a shard concept, is convenient for the server shard use. The insertion performance of each additional shard,mongodb will also grow in close multiples, and the disk capacity can be easily expanded.

3. MongoDB also comes with support for the Map-reduce Computing Framework , which is also handy for data statistics.

Other aspects of the advantages are still in the excavation, I have just contacted this soon.

8.4 MongoDB Defects

1. transaction relations support is weak . This is also a common flaw for all NoSQL databases, but NoSQL is not designed for transactional relationships, and the application is still in demand.

2. There is some lack of stability , as can be seen from the above tests.

3. MongoDB on the one hand in the convenience of developers at the same time, on the other hand on the operation and maintenance personnel have put forward a considerable number of requirements. the industry does not have mature mongodb operation and maintenance experience, MONGODB data storage format is also very casual, and so on, and so on the test of operation and maintenance personnel.

MongoDB and MySQL insert performance test "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.