Use YCSB to test MongoDB's performance

Source: Internet
Author: User
Tags benchmark install mongodb mongodb driver mongodb server mongodb sharding database sharding

Use YCSB to test MongoDB's performance

MongoDB database-Level Lock

MongoDB is currently the most popular NoSQL database. It is favored by many developers with its natural document-type data structure, flexible data models, and easy-to-use horizontal scaling capabilities. However, MongoDB does not have any weakness. For example, its database-Level Lock is a performance bottleneck that people often complain about. To put it simply, the MongoDB database-Level Lock is applicable to all write operations on a database and can only be performed when only one mutex lock of the database is obtained. This sounds terrible, but in fact, because a write operation only retains the lock for the moment when the memory data is updated, each write lock usually takes up a matter of seconds. Because of this, the database-level locks in actual applications do not have a significant impact on performance.

In a few scenarios with ultra-high concurrent writes, database-level locks may be a bottleneck. This can be observed through the DB Lock % (or mongostat command line output) indicators in the MongoDB MMS monitoring. Generally, if the DB Lock % exceeds 70-80% and continues, the database is regarded as saturated. How can this problem be solved?

Solution 1: sharding
This is the standard answer for MongoDB if you have enough hardware resources. Sharding is the ultimate solution to most performance bottlenecks.

Solution 2: Database sharding
This is a very effective work und. The specific method is to divide your data into several different databases, and then implement a route switch at the data access layer in the application to ensure that data read/write is directed to the corresponding database. For a better example, in a population census database, you can create a separate database for each province. 31 databases form a large logical database. However, this method is not always usable. For example, if you need to perform the same operation as querying and sorting a lot of Full-database data, therefore, it may be difficult or impossible to coordinate the results of multiple databases.

Solution 3: waiting
MongoDB 2.8 is coming soon. The biggest change of 2.8 is to change the database-Level Lock to the document-Level Lock. Performance problems caused by database-level locks are expected to be greatly improved.

Solution 4: Differential Film
The definition of a differential slice is to use the MongoDB sharding technology, but multiple or all of the sharding Mongod runs on the same server (the server can be a physical machine or virtual machine. Due to the existence of database-level locks and the fact that MongoDB is not very high on multi-core CPU utilization, the Performance Tuning Method of the differential chip is good when the following conditions are met:
1) The server has multiple cores (4 or 8 or more) CPU
2) The server has not encountered an I/O bottleneck
3) There is enough memory for hot data (no frequent page faults)

In this article, we conduct some performance tests to see the impact on performance improvement after using the differential film technology.

YCSB Performance Testing Tool

Before starting the test, I would like to take a moment to introduce the YCSB tool. The reason is that when I see development engineers or DBAs testing, they often use some very simple tools as clients to perform highly concurrent insertion or reading tests. MongoDB itself is a high-performance database, and the concurrency can reach tens of thousands per second under proper optimization. If the client code is simple, crude, or even a single-threaded client, the bottleneck of performance testing is the client itself, rather than the server. Therefore, selecting an efficient client is an important first step in a good performance test.

YCSB is a benchmark tool developed by Yahoo for benchmarking the next-generation database. The full name is Yahoo! Cloud Serving Benchmark. They want to develop a standard tool to measure the performance of different databases. YCSB has made many optimizations to improve client performance. For example, the original bit array is used for data types to reduce the time required for data object creation and conversion. Major features of YCSB:
* Supports common database read/write operations, such as insert, modify, delete, and read operations.
* Multithreading is supported. YCSB is implemented in Java and supports multiple threads.
* Flexibly define scenario files. You can flexibly specify test scenarios through parameters, such as 100% inserts, 50% reads, and 50% writes.
* Data Request Distribution Method: supports random requests, zipfian (only a small part of the data gets most of the access requests), and several request distribution methods of the latest data
* Scalability: you can modify or extend the YCSB function by extending Workload.

Install YCSB

Because YCSB itself will carry a lot of work, it is generally recommended to deploy YCSB on a separate machine, preferably 4-8 core CPU, 8 GB memory or more. YCSB and database servers must ensure a minimum bandwidth of 1 Gbit/s, preferably 10 Mbit/s.
* Install JDK 1.7
* Download the YCSB version that implements the MongoDB DRIVER:

Resource Package:

------------------------------------------ Split line ------------------------------------------

FTP address: ftp://ftp1.bkjia.com

Username: ftp1.bkjia.com

Password: www.bkjia.com

In 2015, LinuxIDC.com \ July \ YCSB was used to test the performance of MongoDB differential slices.

For the download method, see

------------------------------------------ Split line ------------------------------------------

* Extract
* Go To The ycsb directory and run the command (a local Mongo database must be on port 27017 ):
./Bin/ycsb run mongodb-P workloads/workloada
* If YCSB can run, the installation is successful.
You can also use Git to pull the source file and compile it by yourself. JDK and Maven tools are required. Github address is: https://github.com/achille/YCSB can refer to this page for compilation and installation YCSB: https://github.com/achille/YCSB/tree/master/mongodb

YCSB scenario File

To use YCSB to test different scenarios, you only need to provide different scenario files. YCSB will automatically generate a response client request based on the attributes of your scenario file. In this test, we will use the following scenarios:
Scenario S1: 100% insert. Used to Load Test Data
Scenario S2: write more read less 90% update 10% read
Scenario S3: Mixed read/write 65% read, 25% insert, 10% update
Scenario S4: Read more write less 90% read, 10% insert, update
Scenario S5: 100% read

The content of scenario file S2 is as follows:
Redis COUNT = 5000000
Operationcount = 100000000
Workload = com. yahoo. ycsb. workloads. CoreWorkload
Readallfields = true
Readproportion = 0.1
Updateproportion = 0.9
Scanproportion = 0
Insertproportion = 0
Requestdistribution = uniform
Insertorder = hashed
Maid = 250
Fieldcount = 8
Mongodb. url = mongodb: // 192.168.1.2: 27017
Mongodb. writeConcern = acknowledged
Threadcount = 32

Notes:
* The test data includes 5 million documents (recordcount)
* Each document is about 2 kb (fieldlength x fieldcount ). The total data size is 10 Gb + MB Index
* The url of the MongoDB database is 192.168.1.2: 27017.
* MongoDB's write Security Settings (mongodb. writeConcern) are acknowledged
* The number of threads is 32 (threadcount)
* Document insertion sequence: Hash/random (insertorder)
* Update: 90% (0.9)
* Read: 10% (0.1)

Download all the scenario files (S1-S5) and decompress them to the ycsb directory created above:

MongoDB Configuration

This test is performed on AWS virtual hosts. The following are the server configurations:
* OS: Amazon Linux (similar to CentOS)
* CPU: 8 vCPU
* RAM: 30 GB
* Storage: 160 GB SSD
* Journal: 25g ebs with 1000 PIOPS
* Log: 10 Gb EBS with 250 IOPS
*
* MongoDB: 2.6.0
* Readahead: 32

Notes:
MongoDB data, recovery logs, and system logs use three different storage disks. This is a common optimization method to ensure that the log writing operation does not affect the data disk I/O. In addition, the readahead setting of the server is changed to the recommended 32. For more information about readahead, see:

Standalone Benchmark Test

Before we test the performance of a differential chip, we need to obtain the maximum performance of a single machine. Start the target MongoDB server. After logging on, delete the ycsb database (if it already exists)
# Mongo
> Use ycsb
> Db. dropDatabase ()

Scenario S1: data insertion
Next, run YCSB. Go to the ycsb directory and run the following command (check that the scenario files S1, S2, S3, S4, and S5 exist in the current directory)
./Bin/ycsb load mongodb-P S1-s

If the job runs normally, YCSB prints the current status every 10 seconds, including the concurrency rate per second and the average response time. For example:
Loading workload...
Starting test.
0 sec: 0 operations;
Mongo connection created with localhost: 27017/ycsb
10 sec: 67169 operations; 7002.16 current ops/sec; [INSERT AverageLatency (us) = 4546.87]
20 sec: 151295 operations; 7909.24 current ops/sec; [INSERT AverageLatency (us) = 3920.9]
30 sec: 223663 operations; 7235.35 current ops/sec; [INSERT AverageLatency (us) = 4422.63]

While running, you can use mongostat (or better choice: MMS) to monitor the real-time metrics of MongoDB and check whether the reports are consistent with those of YCSB.

The output is similar to the following:
[OVERALL], RunTime (MS), 687134.0
[OVERALL], Throughput (ops/sec), 7295.168457372555
...
[INSERT], Operations, 5000000
[INSERT], AverageLatency (us), 4509.1105768
[INSERT], MinLatency (us), 126
[INSERT], MaxLatency (USD), 3738063
[INSERT], 95 thPercentileLatency (MS), 10
[INSERT], 99 thPercentileLatency (MS), 37
[INSERT], Return = 0, 5000000
...

This output tells us that 5 million records are inserted, which takes 687 seconds, the average concurrency is 7295 records per second, and the average response time is 4.5 ms. Note that this value is of no reference value for MongoDB performance indicators. If there is any inconsistency in your environment, the size of the inserted data, or the number of indexes is different, the results will be very different. Therefore, this value can only be used as the benchmark value for the performance comparison between this test and the differential film.

For MongoDB, pay special attention to the metrics such as page faults, network, and DB Lock % reported by mongostat or MMS. If your network is 1 Gb/s and the syststat reports a m number, your network is basically saturated. 1 Gbit/s bandwidth is the transmission rate of 128 Mbit/s. In my test, the network in is kept at 14-15 Mb/s, which is consistent with the concurrency rate and document size per second (7300x2KB.

To find an ideal number of client threads, I have repeated the same operation multiple times, and each time I have modified the threadcount value in the scenario file. The test results show that about 30 threads have reached the maximum concurrency. Increasing the number of threads does not improve performance. Because the threadcount value in my scenario file is set to 32.

Now we have 5 million test data in the database. Now we can test other scenarios. Note: The first parameter of YCSB is the test phase. This is data import. Therefore, the first parameter is "load ". After the data is imported, the next step is the running stage. Therefore, the second parameter is "run ".

Scenario S2: write more and read less
Command:
./Bin/ycsb run mongodb-P S2-s
Result
...
[OVERALL], Throughput (ops/sec), 12102.2928384723

Scenario S3: Hybrid read/write (65% read)
Command:
./Bin/ycsb run mongodb-P S3-s
Result
...
[OVERALL], Throughput (ops/sec), 15982.39239483840

Scenario S4: read-write-less
Command:
./Bin/ycsb run mongodb-P S4-s
Result
...
[OVERALL], Throughput (ops/sec), 19102.39099223948

Scenario S5: 100% read
Command:
./Bin/ycsb run mongodb-P S5-s
Result
...
[OVERALL], Throughput (ops/sec), 49020.29394020022

Differential film test

We have just obtained the performance metrics of a single machine in five scenarios. Next, we can start to test the performance indicators in the case of different slices and different numbers of slices.

First, stop the MongoDB database on a single machine.
Next we will create a sharding cluster. Here let me recommend you a very convenient MongoDB tool: mtools https://github.com/rueckstiess/mtools

Mtools is a collection of several MongoDB-related tools. mlaunch helps us to create a replica set or sharded cluster on a single machine without any effort.

Install mtools (Python and Python package management tools pip or easy_install are required ):
# Pip install mtools

Or

# Easy_install mtools

Then, create a new directory and create a differential chip cluster under the new directory:
# Mkdir shard2
# Cd shard2
# Mlaunch-sharded 2-single

This command creates four processes on the same machine:
* One mongos is running on port 27017.
* One server's mongod is configured on port 27020.
* The mongod of the two shard servers is on ports 27018 and 27019.

These four processes form a cluster with two shards. It is worth noting that although we have already set up a sharding cluster, all the data at this time will only go to one of the shards, which is called the primary shard. To allow MongoDB to distribute data to each shard, you must explicitly activate the database to be sharded and the Set Name.
# Mongo
Mongos> sh. enableSharding ("ycsb ")
{"OK": 1}
Mongos> sh. shardCollection ("ycsb. usertable", {_ id: "hashed "})
{"Collectionsharded": "ycsb. usertable", "OK": 1}

The preceding two commands activate the sharding function of the "ycsb" database and the "usertable" set in the database. You also need to specify the partition key when enabling the sharding for the set. Here {_ id: "hashed"} is used to indicate that the hash value of the _ id field is used as the partition key. Hash Value partition keys are suitable for scenarios where a large number of writes are performed. Write operations can be evenly distributed to each shard.

Next we can run the following five scenarios in sequence and collect test results (note the first parameter of ycsb ):
./Bin/ycsb load mongodb-P S1-s
./Bin/ycsb run mongodb-P S2-s
./Bin/ycsb run mongodb-P S3-s
./Bin/ycsb run mongodb-P S4-s
./Bin/ycsb run mongodb-P S5-s

After testing, use the following command to turn off the entire cluster:
# Mlaunch stop

Similarly, you can create separate directories for four, six, and eight member differential-chip clusters and repeat the tests in five scenarios. All test results are as follows:

Conclusion

From the table above, we can draw the following conclusions:

* When appropriate, differential slices can significantly increase the MongoDB concurrency.

* The delimiters are not helpful for read-only application scenarios.

* The Best optimization for mixed read/write scenarios (which is also the most common scenario in practice) by using a differential chip: 275%

* The six delimiters have basically reached the saturation state, and the addition of more fragments has not been significantly improved. This number may vary from person to person.

MongoDB 3.0 official version released and downloaded

CentOS compilation and installation of MongoDB

CentOS compilation and installation of php extensions for MongoDB and mongoDB

CentOS 6 install MongoDB and server configuration using yum

Install MongoDB2.4.3 in Ubuntu 13.04

MongoDB beginners must read (both concepts and practices)

MongoDB Installation Guide for Ubunu 14.04

MongoDB authoritative Guide (The Definitive Guide) in English [PDF]

Nagios monitoring MongoDB sharded cluster service practice

Build MongoDB Service Based on CentOS 6.5 Operating System

MongoDB details: click here
MongoDB: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.