The tests reproduced on the Internet are for reference only. Original address: Http://www.mongoing.com/benchmark_3_0
This test process uses the 2 class machine.
- The tests are conducted in single-machine, single-instance cases.
- Machine A (cache 12G, Memory > Data):
- Data: {_id: Default, Name: "Edison", Num: Random number}
- Using the engine: Wiredtiger
- Index: The NUM field is indexed in addition to the _id index.
os:centos6.5 64
Cpu:8 Nuclear E5 2407 2.4GHZ
ram:16g
disk:300g sata*2 RAID 1
(Click to view larger image)
Simultaneous connection number op/s
6 15K
Ten 25K
40K
50K
50K
- Update (table 200 million data):
Simultaneous connection number op/s
6 18K
Ten 25K
32K
38K
42K
- Query (table 200 million data):
Simultaneous connection number op/s
6 23K
Ten 42K
50K
50K
50K
- TPS (table 200 million data):
Simultaneous connection number query/s insert/s
6 15K 15K
20K 20K
21K 21K
23K 23K
23K 23K
- Machine B (Cache 12G, Memory > Data):
- Data: {_id: Default, Name: "Edison", Num: Random number}
- Using the engine: Wiredtiger
- Index: The NUM field is indexed in addition to the _id index.
os:centos6.5 64
Cpu:24 Nuclear E5 2403 1.8GHZ
ram:64g
disk:300g SSD RAID 10
(Click to view larger image)
Simultaneous connection number op/s
3 23K
4 50K
6 55K
8 65K
75K
85K
95K
100K
110K
150K
164K
- Update (table 200 million data):
Simultaneous connection number op/s
3 14K
6 23K
44K
63K
93K
130K
140K
140K
150K
- Query (table 200 million data):
Simultaneous connection number op/s
3 10K
6 41K
84K
120K
140K
180K
190K
193K
195K
- TPS (table 200 million data):
Simultaneous connection number query/s insert/s
3 10K 10K
6 31K 31K
44K 44K
60K 60K
75K 75K
75K 75K
The following test on machine B is not enough memory to load all the data (can only be installed under the Index/index) of the situation
The memory is loaded with all the data:
- Query (table 300 million data):
Simultaneous connection number op/s
3 20K
5 40K
8 58K
80K
90K
130K
170K
180K
195K
The memory can only be loaded with index:
- Cache:data = 4:10
- Query (table 300 million data):
Simultaneous connection number op/s
3 8K
5 10K
8 16K
20K
25K
32K
40K
48K
57K
The memory even index cannot be completely lowered:
- Cache:data = 1:10
- Query (table 300 million data):
Simultaneous connection number op/s
3 3.4K
5 4.5K
8 9.3K
11K
14K
20K
24K
25K
34K
(Click to view larger image)
The index is all in memory:
Cache:data is 4:10
The index is not all in memory
Cache:data is 1:10
- Memory is large enough, the CPU is the bottleneck, the better the CPU performance of the machine, MongoDB performance better.
MongoDB Official Report:
http://www.mongoing.com/archives/862
Objective:
The main focus of MongoDB 3.0 is to improve performance, especially write performance and utilization of hardware resources. To demonstrate the results we've achieved in 3.0 and how to apply these new improvements, we'll publish a series of blogs to compare the performance of MongoDB 2.6 and 3.0.
Just like all benchmarks, these test results do not necessarily apply to your scene. MongoDB's usage scenarios are diverse, and it's critical to use performance tests that reflect your application needs and the hardware you're going to deploy. Because of this, there is currently no "standard" test program that can tell you which technology is best for your application. Only your needs, your data, your infrastructure can tell you what you need to know.
To help us measure performance, we have worked with the community to take hundreds of different tests. These tests reflect a wide variety of applications developed by users, as well as the evolving environment used to deploy these applications.
YCSB is used by some agencies as a tool for performance testing of several different databases. The YCSB feature is quite limited and does not necessarily tell you all the information you want to know about your application's performance. Of course, YCSB is still quite popular, and users of MongoDB and some other database systems are familiar with it. In this article we will compare the results of MongoDB2.6 and 3.0 YCSB test.
Concurrent Volume:
In the YCSB test, MongoDB3.0 is about 7 times times more likely to grow in multi-threaded, bulk-insert scenarios than MongoDB2.6. We should expect this test scenario to be the biggest improvement since this is a 100% write operation, and Wiredtiger's document-level concurrency control is most helpful for multithreaded test scenarios on multi-core processor servers.
The second Test compared the 95% read and 5% update scenarios on both systems. Here we see that Wiredtiger has 4 times times more throughput. Compared to the pure insert scenario just now, this performance increase is less significant because the write operation accounts for only 5% of all operations. In MongoDB2.6, concurrency control is at the database level, and writing blocks read operations, reducing overall concurrency. Through this test, we see that the more granular concurrency control of MongoDB 3.0 significantly increases the total concurrency.
Finally, for scenarios where read and write operations are balanced, we see that MongoDB3.0 has a concurrency rate of 6 times times. This is better than the 4 times-fold improvement we just saw in 95% reading, because there are more write operations here.
Response delay:
It is not enough to only monitor concurrency in performance tests, we also consider the response latency of the operation. Response delay for many operations an average average response delay value is not the best metric. For developers who want the system to have a consistently good, low-latency experience, they are more concerned with the least effective operation in the system. High-latency queries typically measure –95th percentiles with 95th and 99th percentiles to indicate that this value (response latency) is worse (slower) than 95% of other operations. (one might find this measure inaccurate: because many Web requests use hundreds of of database operations, it is likely that most users will experience a slow response delay of these 99th percentiles)
In the read response delay we see that the difference between MongoDB2.6 and MongoDB3.0 is very small, and that the read response delay is maintained in a stable value of 1ms or less. However, there is a significant difference in the response latency of the update operation.
Here we compare the 95th and 99th percentiles of the update response latency by reading intensive workloads. The update delay in MongoDB3.0 has improved significantly, with a nearly 90% reduction in the 95th and 99th percentiles. This is important, and increasing concurrency should not be at the expense of longer delays, as this will ultimately reduce the user experience of the application.
Under a balanced workload, the update latency remains at a lower level. In the 95th percentile, the update latency of the MongoDB3.0 is almost 90% lower than the MongoDB2.6, while the 99th percentile is 80% lower. These improvements can provide users with a better experience and more predictable performance.
We believe that these tests for concurrency and response latency demonstrate a significant improvement in the write performance of MongoDB.
Small changes have a big impact:
In the next blog we will describe some small changes that can have a big impact on MongoDB performance. As a preview, let's look at a factor that people often overlook.
Provide sufficient client capabilities:
The default configuration of YCSB uses only one thread. With a single thread, no matter which database your test results will be pathetic. Do not use single-threaded benchmarks unless your application runs a single thread. Single-threaded testing only measures response time and cannot test concurrency, and capacity planning takes into account both factors.
Most databases are designed for multi-threaded clients. Find the best number by gradually increasing the number of threads – add more threads until you find that the concurrency rate stops rising or the response time is increasing.
Consider running multiple client servers for YCSB. A single client may not be able to generate sufficient pressure to test the capacity of the system. Unfortunately, YCSB doesn't make it easy to use multiple clients, you have to manually reconcile the start and stop of multiple clients, and you have to synthesize the test results for each client yourself.
When using shards, a mongos is used per 1, 2 shards, and each mongos uses a YCSB client. Too many client processes can cause the system to crash, starting to cause a delay in response, which eventually causes the CPU to crash. Sometimes you need to control the amount of client requests.
Finding the right balance in response latency and concurrency is an important part of any performance tuning. By monitoring latency and concurrency, and by increasing the amount of threads through a series of tests, you can determine a definite relationship between latency and concurrency, and calculate the optimal number of threads in a given workload.
Based on these test results we observed the following two points:
1.99th percentile of all operations until 16 threads are less than 1ms. When more than 16 threads, the response delay begins to increase.
2, the concurrency increase from 1 to 64 threads will continue to grow. After 64 threads, increasing the number of threads no longer increases concurrency, but the response latency is increasing.
Based on these, the optimal number of threads for an application is between 16 and 64, depending on whether we prefer to reduce latency or increase concurrency. In 64 threads, the delay looks good: 99th percentile reads less than 1ms and 99th percentile writes less than 4ms. At the same time, the concurrency is higher than 130,000/s
YCSB Test Configuration
We tested a number of different configurations to determine the best balance between concurrency and response latency.
In these tests, we used 30 million of the documentation and 30 million of the operations. The document contains a 100-byte field (a total of 151 bytes) and selects Zipfian as the read operation selection record. We determined the optimal number of threads by gradually increasing the thread volume until the 95th and 99th percentile delay began to increase and the concurrency stop increasing, and the result is a test based on this optimal number of threads.
All tests use a replication set, the recovery log is turned on (Journal), and the server environment is configured with our best practices. We always recommend that you use replication sets in a production environment.
The YCSB client runs on a dedicated server. Each replica set member also runs on a dedicated server. All servers are SoftLayer bare metal with the following specifications:
1. cpu:2x Deca Core Xeon 2690 v2–3.00ghz (Ivy Bridge) –2 x 25MB Cache
2, ram:128 GB registered DDR3 1333
3, storage:2x 960GB SSD drives, SATA Disk Controller
4, Network:10 Gbps
5. Ubuntu 14.10 (+ bit)
6, MongoDB Versions:mongodb 2.6.7; MongoDB 3.0.1
MongoDB 3.0 WT Engine Performance test (reprint)