Analysis of the thread security model of PHP and ZendEngine

Source: Internet
Author: User
Tags sapi zts
When I read the PHP source code and learned PHP extension development, I was exposed to a large number of Macros containing the word "TSRM. By checking the information, we know that these macros are related to the Zend thread security mechanism. In most documents, we recommend that you use these macros according to the established rules, without explaining the specific functions of these macros.

When I read the PHP source code and learned PHP extension development, I was exposed to a large number of Macros containing the word "TSRM. By checking the information, we know that these macros are related to the Zend thread security mechanism. In most documents, we recommend that you use these macros according to the established rules, without explaining the specific functions of these macros.

I don't know what's going on is always uncomfortable, so I will read the source code and read the limited information to briefly understand the relevant mechanisms. This article is my summary of the study. This article first explains the concept of Thread security and the background of Thread security in PHP, and then studies in detail the Thread security mechanism ZTS (Zend Thread Safety) of PHP and the specific implementation of TSRM, the research includes the relevant data structure, implementation details, and operating mechanism. Finally, the Zend's selective compilation problem for single-thread and multi-thread environments is studied.

Thread Security
Thread security issues are simply how to securely access public resources in a multi-threaded environment. We know that each thread has only one private stack and shares the heap of the process. In C, when a variable is declared outside of any function, it becomes a global variable, which will be allocated to the shared storage space of the process, different threads reference the same address space. Therefore, if a thread modifies this variable, it will affect all threads. This seems to provide convenience for threads to share data. However, PHP usually processes one request for each thread. Therefore, each thread is expected to have a copy of the global variable without mutual interference between requests. Early PHP was often used in a single-threaded environment. Every process started only one thread, so there was no thread security problem. Later, PHP was used in a multi-threaded environment. Therefore, Zend introduced the Zend Thread Safety mechanism (ZTS) to ensure Thread security.

Basic principles and implementation of ZTS
Basic Ideas
The basic idea of ZTS is intuitive. Isn't it that every global variable must have a copy in every thread? I will provide a mechanism like this: in a multi-threaded environment, applying for a global variable is no longer simply declaring a variable, but the entire process allocates a block of memory space on the stack for the "thread global variable pool ", initialize the memory pool when the process starts. Whenever a Thread needs to apply for a global variable, call TSRM through the corresponding method (Thread Safe Resource Manager, the specific implementation of ZTS) and pass the necessary parameters (such as the variable size), TSRM is responsible for allocating the corresponding memory block in the memory pool and returning the reference ID of this memory, so that the next time this thread needs to read and write this variable, TSRM is responsible for real read/write operations by passing a unique reference identifier to TSRM. In this way, global variables of thread security are implemented. The principles of ZTS are given:
Thread1 and Thread2 belong to the same process. Each of them requires a Global Var variable, and TSRM allocates a region for each of them in the Global thread Memory Pool (yellow part, they are identified by a unique ID, so that the two threads can access their own variables through TSRM without interfering with each other. The following code snippet shows how Zend implements this mechanism. Here I use the source code of PHP5.3.8. The TSRM implementation code is under the "TSRM" directory of the PHP source code.

Data Structure
There are two important data structures in TSRM: tsrm_tls_entry and tsrm_resource_type. Next, let's take a look at tsrm_tls_entry. Tsrm_tls_entry is defined in TSRM/TSRM. c:
The Code is as follows:
Typedef struct _ tsrm_tls_entry;

Struct _ tsrm_tls_entry {
Void ** storage;
Int count;
THREAD_T thread_id;
Tsrm_tls_entry * next;
}

Each tsrm_tls_entry structure represents all global variable resources of a thread. thread_id stores the thread ID, count records the number of global variables, and next points to the next node. Storage can be seen as a pointer array, where each element is a global variable pointing to the current node representing the thread. The tsrm_tls_entry of each thread is eventually formed into a linked list structure, and the linked list header pointer is assigned to a Global static variable tsrm_tls_table. Note: Because tsrm_tls_table is a genuine global variable, all threads will share this variable, which achieves memory management Consistency between threads. The tsrm_tls_entry and tsrm_tls_table structures are as follows:
The internal structure of tsrm_resource_type is relatively simple:
The Code is as follows:
Typedef struct {
Size_t size;
Ts_allocate_ctor ctor;
Ts_allocate_dtor dtor;
Int done;
}

Tsrm_resource_type; as mentioned above, tsrm_tls_entry is based on threads (one node for each thread), and tsrm_resource_type is based on resources (or global variables). Each time a new resource is allocated, A tsrm_resource_type is created. All tsrm_resource_types form a tsrm_resource_table in an array (linear table). The subscript is the ID of the resource. Each tsrm_resource_type stores the size, structure, and destructor pointer of the resource. To some extent, tsrm_resource_table can be considered as a hash table. The key is the resource ID, and the value is the tsrm_resource_type structure.

Implementation Details
This section analyzes the implementation details of some TSRM algorithms. Because the entire TSRM involves a lot of code, here are two representative function analyses. The first one worth noting is the tsrm_startup function, which is called by sapi at the beginning of the process to initialize the TSRM environment. Because tsrm_startup is a little longer, here is an excerpt of what I think should be paid attention:
The Code is as follows:
/* Startup TSRM (call once for the entire process )*/
TSRM_API int tsrm_startup (int expected_threads, int expected_resources, int debug_level, char * debug_filename)
{
/* Code ...*/

Tsrm_tls_table_size = expected_threads;

Tsrm_tls_table = (tsrm_tls_entry **) calloc (tsrm_tls_table_size, sizeof (tsrm_tls_entry *));
If (! Tsrm_tls_table ){
TSRM_ERROR (TSRM_ERROR_LEVEL_ERROR, "Unable to allocate TLS table "));
Return 0;
}
Id_count = 0;

Resource_types_table_size = expected_resources;
Resource_types_table = (tsrm_resource_type *) calloc (resource_types_table_size, sizeof (tsrm_resource_type ));
If (! Resource_types_table ){
TSRM_ERROR (TSRM_ERROR_LEVEL_ERROR, "Unable to allocate resource types table "));
Free (tsrm_tls_table );
Tsrm_tls_table = NULL;
Return 0;
}

/* Code ...*/

Return 1;
}

In fact, the main task of tsrm_startup is to initialize the two data structures mentioned above. The first interesting thing is its first two parameters: expected_threads and expected_resources. These two parameters are passed in by sapi, indicating the expected number of threads and the number of resources. We can see that tsrm_startup will pre-allocate space (through calloc) according to these two parameters ). Therefore, TSRM will first allocate resources that can accommodate expected_threads threads and expected_resources. It depends on what each sapi will pass in by default. You can view the source code of each sapi (under the sapi directory). I have simply read it:
We can see that common sapis such as mod_php5, php-fpm, and cgi are pre-allocated with a thread and a resource, because they do not waste memory space, and in most cases PHP is still running in a single-thread environment. Here we can also see an id_count variable, which is a global static variable, which is used to generate resource IDs through auto-increment. This variable is initialized to 0 here. Therefore, the TSRM method for generating resource IDS is very simple: it is an auto-increment of an integer variable. The second one that needs to be analyzed carefully is ts_allocate_id. Friends who have compiled PHP extensions are certainly familiar with this function. This function...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.