Mysql-proxy Learning (IV)-Performance Testing

Source: Internet
Author: User

1. hardware configuration

Hwconfig

Summary: Intel s5500wbv, 2 x Xeon e5620 2.40 GHz, 23.5 GB/24 GB 1067 MHz
System: Intel s5500wbv
Processors: 2 x Xeon e56 20 2.40 GHz 133 MHz FSB (htenabled, 8 cores, 16 threads)
Memory: 23.5 GB/24 GB 1067 MHz = 6x4 GB, 2 xempty
Gigabit Nic

2. Test Case

Each persistent connection sends a 100,000 select 1 request packet (that is, the request packet size is 13, and the response packet size is 56 ). We recommend that you use netperf to test the network environment before testing the server. The normal lan network should be able to send the above packets up to 300,000 PPS, and the concurrent connection of 60 to MySQL-server can reach 90000 QPS. CMD: Bin/MySQL-proxy -- daemon -- PID-file = log/proxy. PID -- proxy-address =: 4040 -- log-level = debug -- log-file = log/proxy. log -- proxy-backend-addresses = 10.232.64.76: 17907
-- Event-threads = 16, that is, the following test does not take the Lua script, nor does the connection pool C-cons: B-cons =.

3. Experiment Data

Event-threads Concurrent-cons QPS CPU CS
1 1 3800 30%  
1 10 30000 96%  
1 30 33000 100% 400
4 1 4200 48%  
4 10 37400 300%  
4 30 53000 360% 120000
4 50 69131 350% 150000
4 80 67000 350% 150000
12 1 3797 50% 110000
12 5 20000 300% 500000
12 12 37233 600% 600000
12 30 51890 800% 480000
12 50 53255 900% 400000
12 80 53426 950% 350000
16 10 29308 850% 832000
16 30 48390 1110% 580000
16 50 48056 1200% 400000
16 80 47000 1350% 300000

During the test, through mpstat-I sum-u-p all 1, we found that all Nic interruptions only run on 0 CPU, and the previous Nic SMP was set, then, if perf is used, the machine crashes several times, and the SMP settings before the restart are invalid. All the above data is not enabled Nic SMP data (nic smp see http://jasonwu.me/category/Linux ). Later, I tried to enable the SMP of the NIC, and found that the data was not as good as above, but the difference was not very low as about 2000. I thought that I could allocate intr and soft to various CPUs when I started it, in order to improve the performance, we finally spread out, but the QPS did not go up. It seems that the performance is not in kernel network processing. The centralized single CPU processing can meet our QPS or even higher requirements. However, opening SMP may have a negative impact on this insufficient pressure, as for the reason, it is beyond my current knowledge (it may be the affinity of the code, and you are welcome to give me some advice). Of course, we do not mean that the SMP For nic activation is better than not, however, our current bottleneck is not on Nic processing. When the upper layer processing capacity exceeds the NIC receiving capacity of a single CPU, enable Nic SMP at this time, it should be able to achieve higher performance.

4. Network Model of MySQL-proxy

As mentioned in previous articles, MySQL-proxy is an event-driven model based on libevent, but it lacks the function and relationship of multithreading in this model: On mysql-proxy, the thread is event-thread, that is, the thread that processes the event. Mysql-proxy mainly has three types of event FD: Listen FD, socketpair [2], and net-read (write ). The first is obviously the default MySQL listening port 4040. This event is registered only on the main-thread. The second is a socket pair, that is, two socket FD, socketpair [0] is used for reading (All threads care about this event
Readable, callback function chassis_event_handle), socketpair [1] is used for writing. When you want to add an event (this event is the third type of event) (chassis_event_add ), the event is put in a global queue and then written to socketpair [1 ". ", then all threads will be awakened once. After waking up, they get the saved event from the global queue and register the event to their own event-base, when this event is triggered, it will be processed by the new acquired thread. In addition, the Awakened thread will retrieve as many event points as possible, as long as it can obtain the lock of the queue; the third type is read/write used for actual message transmission.
Socket (their callback functions are network_mysqld_con_handle). The relationship between this type and the second type is: when we are going to Recv or send socket, an eagain error occurs, then we should re-register the event (all the third types of events are non-ev_persist) and then call the chassis_event_add function, this will trigger a new event that may be processed by other threads.
Expand what will happen when multiple event-threads: ① multiple threads compete for one event-queue; ② when a third socket appears eagain, it will wake up all threads (surprise group); ③ a connected client/severe socket is not fixed on a thread for execution, that is, any third type of event will run randomly on any thread, and will change and migrate during each interaction. In short, event-thread does not care about which connection the event belongs. It only cares about whether the event is available or not. When an event is unavailable, it pushes the event to the global event-queue, then wake up all threads to compete for it.
The above is the network model of MySQL-proxy. Let's take a look at the core processing process. This processing process consists of the following four parts: the event processing (network_mysqld_con_handle) for receiving packets, which completes the read/write of packets and the status judgment and transfer; then calls plugin_call, this function obtains the hook function in the corresponding status based on the status; then the callback function in the execution status (proxy_read_auth_result, proxy_send_query_result, proxy_connect_server ...); The callback function is used to call Lua.
The corresponding functions in the script to complete customized tasks (such as read/write splitting, output select packets, result sets, and other operations ). Note: The first three are all written to death after code compilation, but we can customize and modify the 4th processes. This is where mysql-proxy has an extension. In addition, even if we didn't specify the Lua script above, the first three processes are all required, but 4th won't be called.

5. Data Analysis

After learning about the network model, let's take a look at the above data: first, when the 30 connections of one event-thread are full, obviously, it is useless to have more concurrent connections. So we can see that when event-thread is increased to 4, the same number of concurrent connections QPS is up, and another significant change is that CS is up, this is because of the group shock phenomenon we mentioned above. It wakes up all threads once and has to compete for event-queue after waking up, this virtually adds the time for context switching. So it can also be explained that the larger the number of concurrent connections, the larger the CS (when the number of concurrent connections is the same ), another good explanation is that, when the event-thread is the same, the number of concurrent connections increases and CS increases. However, the strange thing is that CS decreases when the number of concurrent connections increases, this may be because chassis_event_handle is used to retrieve as many events as possible. When the number of concurrent connections is large, it may obtain multiple events at a time, reducing the spread of events.
Of course, what puzzles me most is why the QPS of event-thread increases but decreases. We use four threads as the reference values, so our system has 16 CPUs, even if it cannot be linear, the half should not be too much, so much is required.69131*2 ~ = 100000This level. Of course, one possible explanation is that there are too many CS instances, but in my personal experience, CS is not the cause, that is, CS is high due to other real reasons, then it indirectly affects the QPS, or it is not the cause of CS (because CS may also decline after this bottleneck is solved ), in addition, when there are 16 event-threads, each CPU has user: SYS: Soft: idle approximately = 12%: 65%: 8%: 15%; the above phenomenon MS is not so easy to explain through the Code (personal strength is not up to), so it can only use a variety of tools.

6. Tool Analysis

To prevent a logical CPU from executing an operation separately and affecting the test results, we set the nic smp to multiple CPUs during performance troubleshooting. The detection tool we use here is perf. Use perf record-g-p pid to obtain detailed performance data:



We can see that the two headers are called futex_wake, futex_wait (http://hi.baidu.com/luxiaoyi/blog/item/3db9a302ba9a0f074bfb51e3.html); above we said that the core processing process has four processes, through the perf report above we can see that the first two processes (network_mysqld_con_handle, plugin_call). Let's go back to the Code:

Plugin_call (...)

{...

Lock_lua (SRV-> priv-> SC); <==> g_mutex_lock (SC-> mutex );

Ret = (* func) (SRV, con );

Unlock_lua (SRV-> priv-> SC );

}

Perf is really an artifact. I have been watching the network model before, And I am skeptical about it. I didn't even notice this place. The reason is also very simple. Although plugin_call has not started to call the Lua related function in the second process, it must be used in the third process, all threads share the same global luatable. It can be seen that the lock granularity is quite large.

7. Verify

Here we will just talk about the theory first: 1. To prevent group alarms, change one pair of socketpair to multiple pairs, that is, one pair for each thread. 2. All events are randomly processed by any thread, can a pair of events (client and backend server) be fixed to one thread for processing to reduce thread data migration? 3. Reduce the lock granularity of lock_lua (SRV-> priv-> SC) or remove the Lua part.

Thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.