This article provides details and benchmark test results for Mysql queries per second, explaining my earlier conversations on Mysql Connect.
Review the improvement history of MySQL/InnoDB. You can easily find out. MySQL 5.6 has never been faster than read-only. It is easy to understand and has good expansion in read-only (RO. It is also expected to reach a high level in read + write (RW. (Especially when reading data is the main task of the database)
However. We are also very happy with RO's performance in MySQL 5.6. in MySQL 5.7, we mainly focus on read + write (RW, because big data processing has not yet met our expectations. But RW depends on RO. It can speed up again. Through continuous improvements, the InnoDB team strongly promoted the optimization of the performance of the 5.7 version per second.
Next we will explain in order
In fact, there are two ways to control internal links of read-only workloads in MySQL:
- Use a single table: MDL, trx_sys, and lock_sys (InnoDB)
- Multi-table: trx_sys and lock_sys (mainly InnoDB)
Any quick single-table range testing workload is mainly locked due to the MDL link. Multiple tables are limited by the internal components of InnoDB. (different tables are protected by different MDL locks. In this case, the connection bottleneck in MDL is reduced ). However, it also depends on the workload. A more read-only measurement will be better in MySQL5.6 (such as Sysbench OLTP_RO ), at the same time, queries with less workload and faster speed (such as Sysbench Point-Selects (using a foreign key to retrieve a record) will make all links difficult and can only be measured in 16-core-HT, the 32-core performance is poor .. however, any workload such as the Point-Select test will work with all the internal components of MySQL, so that you can see the maximum performance possible (start with the SQL parser, terminate and obtain the row value ).. in your given MySQL version and given HW configuration, this may also reach the maximum SQL query/QPS per second (QPS) rate.
In Mysql 0.25 million, we obtained the best result of queries per second, which is also the best result obtained by using SQL statements in Mysql/InnoDb.
Of course, only when the 'read-only transaction' function is used can this speed be achieved (New Function on Mysql5.6); in addition, AUTOCOMMIT = 1, otherwise, the CPU will be easily wasted on starting and committing transactions, and you will actually lose the overall system performance.
Therefore, the first improvement introduced in Mysql5.7 is the 'automatic discovery of read-only transactions '(in fact, every InnoDb Transaction is considered read-only until there is a DML declaration outside of this) function ---, this greatly simplifies the read-only transaction function and saves users and developers time. They do not have to manage whether to use the read-only transaction function. However, with this feature, you still cannot reach the potential optimal query rate per second for Mysql, because the CPU time is still wasted in the process of starting and ending transactions.
At the same time, Percona uses different solutions to solve the problem of "transaction list" Management (TRX-list) and slow trx_sys mutex link in InnoDB. Percona's solution performs well when processing high Point-Selects loads with transactions, but MySQL5.7 performs normally (but I won't publish 5.7 of the results because its code is not public )... therefore, at least I can make some comparisons:
Observed results:
- Use the same Roint-Select-TRX read-only test (with transactions) in the eight tables in MySQL5.6, Percona 5.5, and MySQL5.7 (with results of 2013.5 months)
- At the same time, you can also see that in the same 16-core-HT configuration, we are still far away from the results of the peak 0.25 million/s.
- MySQL5.6 extended the link time in trx_sys mutex access, and the number of requests per second since 64 users will decrease.
- Percona5.5 can maintain the load for a long period of time, and the number of requests per second starts to decrease only when the number of users is 512.
- When MySQL5.7 has been maintained for a period of time, the number of requests per second is still not reduced (you cannot see the number of concurrent users in this batch )...
However, it is obvious that if MySQL is used to obtain the maximum potential query rate per second, transactions should be avoided.
Let's take a look at this is our maximum query rate per second in May 2013.
Test on the same 1.8 tables, but no use of MySQL5.6:
Observation:
- The test above is to keep MySQL5.6 always executed on 16 cores, then 16-core-HT, 32-core, 32-core-HT.
- As you can see, the maximum query rate per second is even higher than expected-in MySQL, It is 0.275 million per second.
- The maximum result is 16-core-HT.
- However, the results on 32-core are not as good as those on 16-core-HT (due to competition interruptions, in the same kernel, the configuration with 2 CPU threads can better manage thread competition-so the real concurrency is still stored on 16 threads rather than 32 cores)
The same test on MySQL5.7 seems quite different, because the lock_sys mutex link time in 5.7 is already very low, and the Code related to trx_sys mutex is also changed for the first time:
Observed results:
- First, you can see that 5.7 has better performance than 5.6 in the same 16-core-HT configuration.
- After that, the 32-core configuration is not significantly enhanced!
- In 32-core-HT configuration, the maximum number of requests reaches 0.35 million/second!
- From the above special (aggressive) read-only load test, we can easily see that the results we get in 32 cores are better than 16, at the same time, we have not started hyper-threading (in 32-core-HT )... niu !; -)
On the other hand, there is still room for improvement. Competition for trx_sys continues. We do not fully use the CPU capability to do useful work (there are still many CPU cycles used in lock rotation )... however, the current results are much better than before and are much better than 5.6, so there is no reason to continue mining to improve this performance, we mainly focus on improving the performance of the read/write load that we used to spend a huge amount of space on.
By the end of May, that is, during our performance conference, Sunny added several new changes to the try_sys mutex race, from which the maximum queries per second (QPS) will be available in the future) up to 375 K! Is this enough Performance Improvement for 5.7, right ?; -)
At the same time, we continue to exchange views with the Percona team who suggested other ways to manage the TRX list. Their solution looks very interesting, but in 5.5, such code cannot display a higher number of queries per second (QPS), and such code on 5.6 (once tested on Percona Server 5.6) the maximum number of queries per second (QPS) is no larger than MySQL 5.6. However, the discussion involves an interesting point of view: if some read/write loads are running at the same time, what is the impact on read-only performance ?... In addition, even under the same test conditions, the MySQL 5.7 code still runs better, and the effect is very obvious (you can check my analysis here, however, I will explain it again, during this period of time, I cannot display the result on 5.7, because its code has not been published to the public-it may be provided in a later article )..
Because it also has an impact on any pure read/write load, there is enough motivation to rewrite the Code related to the entire TRX list as Sunnys expects for a long time. However, this kind of experience is fascinating!
;-) Day after day, we are very happy to see that our query charts per second are getting higher and higher, it is still possible to query 440 KB per second on a 32-core hyper-threading server!
5.7 number of results obtained from Select 8 tables on Development Milestone Release 2:
..;-))
However, there is a little strange thing-we tried to analyze the impact of all bottlenecks and code changes with Sunny using different tools. In some tests, I was surprised that Sunny observed a higher number of queries per second than me. This "singularity" is related to the following factors:
- Under high load, the current 5.7 code is running near the hardware limit (mainly the CPU), so each command is very important!
- If the Unix socket or IP port is used, the distinction will be very obvious!
- Sysbench uses 30% of the CPU time, but if the same test load uses the old version of Sysbench (with a shorter code path), it only uses 20% of the CPU, the remaining 10% is used on the MySQL server.
- Therefore, in the case of the same test load, using a Unix socket instead of an IP port, and replacing the Sysbench-0.4.8 with a Sysbench-0.4.13, we will get more than 500 K queries per second! -It's easy, isn't it ?; -))
Let's compare the differences between "before" and "after ".
Observed results:
- CPU usage is reduced by Sysbench.
- Higher CPU availability on the MySQL server.
- We implemented 0.5 million queries per second.
What else?
I may only mention: kudos Sunny and the entire MySQL Development Team;
Let's take a look at the maximum number of queries per second when 8 tables are selected for work load.
- MySQL-5.7.2 (DMR2)
- MySQL-5.6.14
- MySQL-5.5.33
- Percona Server 5.6.13-rc60.5
- Percona Server 5.5.33-rel31.1
- MariaDB-10.0.4
- MariaDB-5.5.32
Each engine is tested in the following Configuration:
- CPU taskset: 8-core-HT, 16-core-HT, 32-core, 32-core-HT
- Concurrent sessions: 8 to 16, 32... 1024
- InnoDB spin wait latency: 6, 96
The best result is a comparison between any two specific combinations. By comparing the database engine, I got a chart below, which I mentioned in my previous articles.
Below are some comments:
- You do not need to make too many comments on the results of the huge gap in Mysql5.7, because this is obvious.
- The interesting thing is that the code library Engine Based on MySQL5.5 does not have any results close to MySQL5.6.
- This confirms that Percona Server reached the level of MySQL5.6 after using the code library engine of MySQL5.6, but the MariaDB-10 is still on the way to exploration.
- Therefore, there is no doubt that MySQL5.6 is the cornerstone of the code!
- MySQL5.7 is another optimized Extension Based on MySQL5.6.
What kind of scalability does it have?
The answer is simple: MySQL5.7 is the only extension on this basis.
If you use an ip port and a heavyweight Sysbench-0.4.13, you will get the following results:
The QPS is only slightly lower, but the overall trend is the same.
Scalability is also very similar:
More results are coming soon;
Note: It is not good to bind too many workloads to a single table:
- Reducing the debate between InnoDB makes other arguments more apparent.
- When the load is bound to a single table, the MDL debate will become more dominant.
- This is expected and will remain unchanged in the next DMRS.
There are still many challenges in front of us ;-)
For reference, the hardware configuration I tested is as follows:
- Server: 32cores-HT (bi-thread) Intel 2300 Mhz, 128 GB RAM
- OS: Oracle Linux 6.2
- FS: Enable EXT4 mounted to "noatime, nodiratime, nobarrier"
My. conf:
max_connections=4000 key_buffer_size=200M low_priority_updates=1 table_open_cache = 8000 back_log=1500 query_cache_type=0 table_open_cache_instances=16# files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group = 3 innodb_open_files=4000# buffers innodb_buffer_pool_size=32000M innodb_buffer_pool_instances=32 innodb_additional_mem_pool_size=20M innodb_log_buffer_size=64M join_buffer_size=32K sort_buffer_size=32K# innodb innodb_checksums=0 innodb_doublewrite=0 innodb_support_xa=0 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit=2 innodb_max_dirty_pages_pct=50 innodb_use_native_aio=1 innodb_stats_persistent = 1 innodb_spin_wait_delay= 6 / 96# perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity = 4000 innodb_purge_threads=1 innodb_adaptive_hash_index=0# monitoring innodb_monitor_enable = '%' performance_schema=OFF
If you need it, the binary version of Linux Sysbench is here:
- Sysbench-0.4.13-lux86
- Sysbench-0.4.8-lux86
Run the Sysbench command to run the Point-Selects test using UNIX socket as follows (start eight processes in parallel ):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench-0.4.8 --num-threads=$1 --test=oltp --oltp-table-size=10000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_10M_$n \ --max-requests=0 --max-time=$2 --mysql-socket=/SSD_raid0/mysql.sock \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
Use the IP port to run the Sysbench command for the Point-Selects test as follows (start eight processes in parallel ):
LD_PRELOAD=/usr/lib64/libjemalloc.so /BMK/sysbench-0.4.13 --num-threads=$1 --test=oltp --oltp-table-size=10000000 \ --oltp-dist-type=uniform --oltp-table-name=sbtest_10M_$n \ --max-requests=0 --max-time=$2 --mysql-host=127.0.0.1 --mysql-port=5700 \ --mysql-user=dim --mysql-password=dim --mysql-db=sysbench \ --mysql-table-engine=INNODB --db-driver=mysql \ --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 \ --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on \ --oltp-read-only=on run > /tmp/test_$n.log &
May you gain something,
-Dimitri