This article mainly introduces the MySQL thread pool principle learning tutorial, including important knowledge about function calls and key interfaces in the thread pool. it is highly recommended! If you need it, you can refer to the thread pool as a core function of Mysql5.6. for server applications, whether it is web application service or DB Service, high-concurrency requests are always a hot topic. When there are a large number of concurrent requests for access, resources must be created and released continuously, resulting in low resource utilization and reduced service quality. A thread pool is a common technology. by creating a certain number of threads in advance, the thread pool allocates a thread to provide services when requests arrive. after the request ends, this thread serves other requests again. In this way, frequent creation and release of threads and memory objects are avoided, server concurrency is reduced, context switching and resource competition are reduced, and resource utilization efficiency is improved. The thread pool of all services is essentially a bit to improve resource utilization efficiency, and the implementation method is also roughly the same. This article describes the implementation principle of the Mysql thread pool.
Before Mysql5.6 appeared, Mysql handled the Connection by One-Connection-Per-Thread, that is, for each database Connection, Mysql-Server will create an independent Thread service. after the request ends, destroys a thread. If you have another connection request, create another connection and then destroy it. This method causes frequent thread creation and release in the case of high concurrency. Of course, with thread-cache, we can cache the thread for the next use to avoid frequent creation and release, but it cannot solve the problem of high connections. The One-Connection-Per-Thread mode requires the creation of the same number of service threads as the number of connections increases. high-concurrency threads mean high memory consumption and more context switches (lower cpu cache hit rate) and more resource competition, resulting in service jitter. Compared with the One-Thread-Per-Connection mode, a Thread corresponds to a Connection. in the Thread-Pool implementation mode, the minimum unit of Thread processing is statement (statement ), A thread can process multiple connection requests. In this way, when the hardware resources are fully utilized (the thread pool size is reasonably set), server jitter caused by an instant increase in the number of connections can be avoided.
Scheduling implementation
Mysql-Server supports three Connection management methods, including No-Threads, One-Thread-Per-Connection, and Pool-Threads. No-Threads indicates that the main Thread is used to process the Connection, and No additional Threads are created. this method is mainly used for debugging. One-Thread-Per-Connection is the most commonly used method before the Thread pool appears, create a thread service for each connection; Pool-Threads is the thread Pool method discussed in this article. Mysql-Server supports three connection management methods at the same time through a set of function pointers. for specific methods, the function pointer is set to a specific callback function, and the connection management mode is controlled by the thread_handling parameter, the code is as follows:
if (thread_handling <= SCHEDULER_ONE_THREAD_PER_CONNECTION) one_thread_per_connection_scheduler(thread_scheduler, &max_connections, &connection_count);else if (thread_handling == SCHEDULER_NO_THREADS) one_thread_scheduler(thread_scheduler);else pool_of_threads_scheduler(thread_scheduler, &max_connections,&connection_count);
Connection Management Process
Listen to mysql port connection requests through poll
After receiving the connection, call the accept interface to create a communication socket
Initialize thd instances and vio objects.
Initialize the scheduler function pointer of the thd instance according to the thread_handling method.
Call the add_connection function specified by scheduler to create a connection.
The following code demonstrates the implementation of the template callback function by the scheduler_functions template and thread pool. this is the core of multiple connection management.
struct scheduler_functions { uint max_threads; uint *connection_count; ulong *max_connections; bool (*init)(void); bool (*init_new_connection_thread)(void); void (*add_connection)(THD *thd); void (*thd_wait_begin)(THD *thd, int wait_type); void (*thd_wait_end)(THD *thd); void (*post_kill_notification)(THD *thd); bool (*end_thread)(THD *thd, bool cache_thread); void (*end)(void);};static scheduler_functions tp_scheduler_functions= { 0, // max_threadsNULL,NULL, tp_init, // initNULL, // init_new_connection_threadtp_add_connection, // add_connectiontp_wait_begin, // thd_wait_begin tp_wait_end, // thd_wait_endtp_post_kill_notification, // post_kill_notification NULL, // end_threadtp_end // end };
Thread pool parameters
- Thread_handling: indicates the thread pool model.
- Thread_pool_size: the number of groups in the thread pool. it is generally set to the current number of CPU cores. Ideally, a group has an active working thread to make full use of the CPU.
- Thread_pool_stall_limit: used by the timer thread to regularly check whether the group is "stuck". The parameter indicates the interval of detection.
- Thread_pool_idle_timeout: when a worker is idle for a period of time, it automatically exits, ensuring that the worker threads in the thread pool keep a low level when they meet the request.
- Thread_pool_oversubscribe: this parameter is used to control the number of threads with "overclock" on the CPU core. This parameter value does not include the listen thread count.
- Threadpool_high_prio_mode: indicates the mode of the priority queue.
Thread pool implementation
The preceding section describes how Mysql-Server manages connections. This section describes the implementation framework of the thread pool and key interfaces. 1
Each green box represents a group, and the number of groups is determined by the thread_pool_size parameter. Each group contains a priority queue and a common queue, including a listener thread and several worker threads. the listener thread and worker thread can be dynamically converted. the number of worker threads is determined by the workload, it is also affected by the thread_pool_oversubscribe settings. In addition, the entire thread pool has a timer thread monitoring group to prevent the group from being "stuck ".
Key interfaces
1. tp_add_connection [process new connection]
1) create a connection object
2) determine the group to which the connection is allocated based on thread_id % group_count.
3) put the connection into the queue of the corresponding group
4) If the number of active threads is 0, a working thread is created.
2. worker_main [working thread]
1) Call get_event to obtain the request
2) If a request exists, handle_event is called for processing.
3) Otherwise, it indicates that there is no request in the queue and the exit ends.
3. get_event [GET request]
1) get a connection request
2) If yes, return immediately and end
3) If there is no listener in the group at this time, the thread is converted to the listener thread, blocking the wait
4) If listener exists, the thread is added to the waiting queue header.
5) the specified thread sleep time (thread_pool_idle_timeout)
6) If the thread is still not woken up and timed out, the thread ends and exits.
7) Otherwise, a connection request in the queue will arrive and jump to 1.
Note: before obtaining a connection request, the system checks whether the number of active threads has exceeded.
Thread_pool_oversubscribe + 1. if it is exceeded, the thread enters the sleep state.
4. handle_event [processing requests]
1) Check whether the connection is verified. if not, perform logon verification.
2) associate thd instance information
3) obtain network data packets and analyze requests
4) call the do_command function to process requests cyclically.
5) obtain the socket handle of the thd instance and determine whether the handle is in the epoll listener list.
6) If no, call epoll_ctl for association.
7) end
5. listener [listening thread]
1) Call epoll_wait to listen on the socket associated with the group, blocking wait
2) if the request arrives, it will be restored from blocking.
3) based on the priority of the connection, determine whether to put it into a common queue or a priority queue.
4) Check whether the tasks in the queue are empty.
5) If the queue is empty, the listener is converted to the worker thread.
6) If there is no active thread in the group, wake up a thread.
Note: epoll_wait listens to all connected sockets in the group, and then listens to the connections
Push the request to the queue. the worker thread obtains the task from the queue and then executes the task.
6. timer_thread [monitoring thread]
1) If there is no listener thread and there is no io_event event recently
2) create a wake-up or a working thread.
3) if the group has not processed the request in the recent period and there are requests in the queue
4) indicates that the group has been stall, then the thread is awakened or created
5) Check for connection timeout
Note: The timer thread checks whether the group is in the stall state by calling check_stall, and checks whether the client connection times out by calling timeout_check.
7. tp_wait_begin [enters the waiting state process]
1) active_thread_count minus 1, waiting_thread_count Plus 1
2) set connection-> waiting = true
3) if the number of active threads is 0 and the task queue is not empty or there is no listening thread
4) wake up or create a thread
8. tp_wait_end [end wait state process]
1) set the waiting status of connection to false.
2) active_thread_count plus 1, waiting_thread_count minus 1
Note:
1) the threads in the waiting_threads list are idle threads, not waiting threads. The so-called idle threads are threads that can process tasks at any time, while waiting threads are waiting for the lock, or wait for io operations and other threads that cannot process tasks.
2) the main function of tp_wait_begin and tp_wait_end is to report the status even if information about active_thread_count and waiting_thread_count is updated.
9. tp_init/tp_end
Call thread_group_init and thread_group_close respectively to initialize and destroy the thread pool.
Thread pool and connection pool
The connection pool is usually implemented on the Client side, which means the application (Client) creates a certain number of connections in advance and uses these connections to serve all the DB requests of the Client. If the number of idle connections is smaller than the number of DB requests at a certain time point, requests need to be queued and waiting for idle connections to be processed. You can reuse connections through the connection pool to avoid frequent connection creation and release, thus reducing the average response time of requests. when requests are busy, the impact of applications on the database can be buffered through request queuing. The thread pool is implemented on the server side. by creating a certain number of thread service DB requests, the thread service is connected to one-conection-per-thread, the minimum unit of the thread pool service is a statement, that is, a thread can correspond to multiple active connections. Through the thread pool, the number of service threads on the server can be controlled within a certain range, reducing the competition for system resources and the consumption of thread context switching, it also avoids high concurrency problems caused by high connections. The connection pool and thread pool complement each other. the connection pool can reduce the creation and release of connections, increase the average request response time, and control the number of DB connections of an application, however, the number of connections of the entire application cluster cannot be controlled, resulting in a high number of connections. the thread pool can effectively cope with the high number of connections and ensure the server can provide stable services. As shown in Figure 2, each web-server maintains three connection pools. each connection in the connection pool is not an exclusive worker of db-server, but may be shared with other connections. Assume that db-server has only three groups, each group has only one worker, and each worker processes two connection requests.
Thread pool optimization
1. Solve the scheduling deadlock
The introduction of the thread pool solves the problem of multi-thread high concurrency, but it also brings a hidden risk. Assume that transactions A and B are allocated to different groups for execution. transaction A has started and held the lock. However, because the group where A is located is busy, as A result, after executing A statement, A cannot get the scheduled execution immediately. transaction B depends on transaction A to release the lock resource. although transaction B can be scheduled, it cannot obtain the lock resource, as a result, you still need to wait. this is called a scheduling deadlock. Because a group processes multiple connections at the same time, multiple connections are not equal. For example, some connections send requests for the first time, while some connections have enabled transactions and hold some lock resources. In order to reduce lock resource contention, the latter should obviously take precedence over the former to release lock resources as soon as possible. Therefore, you can add a priority queue in the group to put requests initiated by connections that hold locks or connections that have enabled transactions into the priority queue, the worker thread first obtains the task execution from the priority queue.
2. big query processing
In a scenario where connections in a group are large queries, the number of worker threads in the group will soon reach the value set by the thread_pool_oversubscribe parameter. for subsequent connection requests, the response is not timely (no more connections are available), and stall occurs in the group. According to the previous analysis, the timer thread regularly checks this situation and creates a new worker thread to process requests. If a long query comes from a service request, all groups are faced with this problem. in this case, the host may be overloaded, resulting in hang. In this case, the thread pool itself is powerless, because the source may be bad SQL concurrency, or the SQL does not follow the execution plan, through other methods, for example, SQL high/low water level throttling or SQL filtering can be used for emergency handling. However, another case is the dump task. Many downstream databases rely on the original data of the database. Generally, the data is pulled to the downstream using the dump command. this dump task usually takes a long time, so it can be considered as a large query. If a dump task is concentrated in a group and other normal business requests cannot respond immediately, this is intolerable because the database has no pressure at this time, but the thread pool policy is adopted, the request response is not timely. to solve this problem, we will not include the threads processing dump tasks in the group into the cumulative value of thread_pool_oversubscribe to avoid the above problem.
One-connection-per-thread
Based on the scheduler_functions template, we can also list several key functions in one-connection-per-thread mode.
static scheduler_functions con_per_functions= { max_connection+1, // max_threads NULL, NULL, NULL, // init Init_new_connection_handler_thread, // init_new_connection_thread create_thread_to_handle_connection, // add_connection NULL, // thd_wait_begin NULL, // thd_wait_end NULL, // post_kill_notification one_thread_per_connection_end, // end_thread NULL // end };
1. init_new_connection_handler_thread
This interface is relatively simple. it mainly calls pthread_detach and sets the thread to the detach state. after the thread ends, all resources are automatically released.
2. create_thread_to_handle_connection
This interface is used to process new connections. for the thread pool, a thread is obtained from the group corresponding to thread_id % group_size for processing, the one-connection-per-thread method determines whether thread_cache can be used. if not, a new thread is created for processing. The specific logic is as follows:
(1) determine whether the number of cached Threads is used up (compare the blocked_pthread_count and wake_pthread sizes)
(2) If a cache thread exists, add thd to the waiting_thd_list queue and wake up a thread waiting for COND_thread_cache.
(3) If no, create a new thread for processing. the thread entry function is do_handle_one_connection.
(4). call add_global_thread to add the thd array.
3. do_handle_one_connection
This interface is called by create_thread_to_handle_connection to process the main implementation interface of the request.
(1). cyclically call do_command to read the network package from the socket and perform the parsing;
(2). exit the loop when the remote client sends a closed connection COMMAND (such as COM_QUIT and COM_SHUTDOWN)
(3). call close_connection to close the connection (thd-> disconnect ());
(4) call the one_thread_per_connection_end function to check whether the thread can be reused.
(5) determine whether to exit the working thread or continue executing the command cyclically based on the returned result.
4. one_thread_per_connection_end
Determine whether the main function of thread_cache can be reused. The logic is as follows:
(1) Call remove_global_thread to remove the thd instance corresponding to the thread.
(2) call block_until_new_connection to determine whether thread can be reused.
(3) determine whether the cached thread exceeds the threshold. if not, blocked_pthread_count ++;
(4). blocking the waiting condition variable COND_thread_cache
(5) after being awakened, it indicates that a new thd needs to be reused. remove thd from waiting_thd_list and use thd to initialize thd-> thread_stack of the thread.
(6) call add_global_thread to add the thd array.
(7) if it can be reused, false is returned; otherwise, true is returned.
Thread pool and epoll
Before the thread pool is introduced, the server layer has only one listening thread to listen to mysql port and local unixsocket requests. an independent thread is allocated for each new connection for processing, therefore, it is easier to listen to threads. mysql achieves IO multiplexing through poll or select. After the thread pool is introduced, each group has a listening thread to listen to all connection socket requests in the group, except for the listening thread at the server layer. the working thread is not responsible for listening, only process requests. For Thread pool setting with overscribe 1000, each listening thread needs to listen to requests of 1000 sockets, and the listening thread uses the epoll method to implement listening.
Select, poll, and epoll are all IO multiplexing mechanisms. IO multiplexing supports listening to multiple fd (descriptors), such as socket, once a fd is ready (read or write), it can notify the program to perform corresponding read/write operations. Epoll has greatly improved compared with select and poll. first, epoll is registered through the epoll_ctl function. during registration, all fd is copied to the kernel, and only one copy does not need to be repeated, each time poll or select is called, the fd set needs to be copied from the user space to the kernel space (epoll waits through epoll_wait). Secondly, epoll specifies a callback function for each descriptor, when the device is ready, wake up the waiting person and add the descriptor to the ready linked list through the callback function. The poll method does not need to be like select, and the poll method uses the polling method. Finally, the select method supports only 1024 fd, epoll has no restrictions. For more information, see cat/proc/sys/fs/file-max settings. Epoll runs through the process of using the thread pool. The following describes how epoll is used to create, use, and destroy epoll lifecycles.
Initialize the thread pool. epoll creates the epoll file descriptor through the epoll_create function and implements the thread_group_init function;
After the Port listening thread monitors the request, it creates a socket, and creates THD and connection objects, which are placed in the corresponding group queue;
When the worker thread obtains the connection object, if the connection object has not been logged on, it will perform login verification.
If the socket has not been registered with epoll, call epoll_ctl for registration. the registration method is EPOLL_CTL_ADD, and put the connection object into the epoll_event struct.
For old connection requests, you still need to call epoll_ctl registration. the registration method is EPOLL_CTL_MOD.
The listening thread in the group calls epoll_wait to listen to the registered fd. epoll is a synchronous IO mode, so it will wait.
When the request arrives, obtain the connection in the epoll_event struct and put it into the queue in the group.
When the thread pool is destroyed, call thread_group_close to disable epoll.
Note:
1. register the fd in epoll. if the request is ready, put the corresponding event into the events array and clear the transaction type of the fd. Therefore, for old connection requests, you still need to call epoll_ctl (pollfd, EPOLL_CTL_MOD, fd, & ev) for registration.
Thread pool function call relationship
(1) create epoll
tp_init->thread_group_init->tp_set_threadpool_size->io_poll_create->epoll_create
(2) disable epoll
tp_end->thread_group_close->thread_group_destroy->close(pollfd)
(3) associate the socket descriptor
handle_event->start_io->io_poll_associate_fd->io_poll_start_read->epoll_ctl
(4) handling connection requests
handle_event->threadpool_process_request->do_command->dispatch_command->mysql_parse->mysql_execute_command
(5) when the worker thread is idle
worker_main->get_event->pthread_cond_timedwait
Wait for thread_pool_idle_timeout and exit.
(6) listening to epoll
worker_main->get_event->listener->io_poll_wait->epoll_wait
(7) port listening thread
main->mysqld_main->handle_connections_sockets->poll
One-connection-per-thread function call relationship
(1) the worker thread waits for the request
handle_one_connection->do_handle_one_connection->do_command->my_net_read->net_read_packet->net_read_packet_header->net_read_raw_loop->vio_read->vio_socket_io_wait->vio_io_wait->poll
Note: different from the thread pool worker threads that have monitoring threads to help them listen on requests, one-connection-per-thread worker threads are idle, will Call poll to block and wait for the network package to come;
The worker threads in the thread pool only need to concentrate on processing requests, so they are more fully used.
(2) port listening thread
Same as thread pool (7)