PHP source code learning thread security, php source code thread
In terms of scope, the C language can define four different variables: global variables, static global variables, local variables, and static local variables. Next, we will analyze different variables from the perspective of function scopes. Assume that all variables are declared with no duplicate names.
Global variables are declared outside the function, for example,int gVar;. Global variable. All functions are shared. This variable name indicates this variable anywhere.
Static global variables (static sgVar) Is actually shared by all functions, but this has compiler restrictions, which is a function provided by the compiler.
Local variables (function/blockint var;), Not shared. The variables involved in multiple function executions are independent of each other. They only have different names.
Local static variables (in the functionstatic int sVar;), This function is shared. The variable involved in every execution of the function is the same variable.
The preceding scopes define scopes from the perspective of functions to meet all the sharing conditions of variables in single-thread programming. Now let's analyze the multi-thread situation. In multiple threads, multiple threads share resources other than the function call stack. Therefore, the definitions of the above scopes have changed.
Global variables, all functions are shared, so all threads are shared. Different variables in different threads are the same variable.
Static global variables, shared by all functions and shared by all threads
Local variables are not associated with the variables involved in each execution of this function. Therefore, they are also not shared among various threads.
Static local variables are shared between functions. Each execution of a function involves the same variable. Therefore, each thread is shared.
I. Origin TSRM
In a multi-threaded system, the process retains the attributes of resource ownership, and multiple concurrent execution streams are the threads that run in the process. For example, in Apache2 worker, the master control process generates multiple sub-processes. Each sub-process contains a fixed number of threads, and each thread processes Requests independently. Similarly, in order not to generate a thread when the request arrives, MinSpareThreads and MaxSpareThreads set the minimum and maximum number of Idle threads, while MaxClients sets the total number of threads in all sub-processes. If the total number of threads in the existing sub-process cannot meet the load, the control process will derive a new sub-process.
When PHP runs on a similar multi-threaded server, PHP is in the multi-threaded life cycle. Within a certain period of time, there will be multiple threads in a process space, and multiple threads in the same process share the global variables after the module is initialized, if the script is run in the same CLI mode as PHP, multiple Threads attempt to read and write some public resources stored in the process memory space (for example, a large number of global variables exists outside the functions initialized by the modules shared by multiple threads ),
At this time, these threads access the same memory address space. When a thread is modified, it will affect other threads. This sharing will increase the speed of some operations, however, large coupling occurs between multiple threads, and when multiple threads are concurrent, common data consistency problems or concurrency problems such as resource competition will occur, for example, multiple running results are different from those of a single thread. If each thread only performs read operations on global variables and static variables without write operations, these global variables are thread-safe, but this situation is not realistic.
To solve Thread concurrency problems, PHP introduces TSRM: Thread Safe Resource Manager ). The TRSM implementation code is stored in the/TSRM directory of the PHP source code and can be called everywhere. Generally, it is called the TSRM layer. Generally, the TSRM layer is enabled during compilation only when it is specified (for example, Apache2 + worker MPM, a thread-based MPM), because Apache under Win32, it is based on multiple threads, so this layer is always enabled under Win32.
II. Implementation of TSRM
The process retains the attributes of resource ownership, and threads perform concurrent access. The TSRM layer introduced in PHP focuses on access to shared resources, here, shared resources are global variables shared between threads in the memory space of processes. When PHP is in single-process mode, a variable is declared outside any function and becomes a global variable.
First, several important global variables are defined as follows (the global variables here are shared by multiple threads ).
/* The memory manager table */static tsrm_tls_entry **tsrm_tls_table=NULL;static int tsrm_tls_table_size;static ts_rsrc_id id_count;/* The resource sizes table */static tsrm_resource_type *resource_types_table=NULL;static int resource_types_table_size;
**tsrm_tls_tableThread safe resource manager thread local storage table, used to storetsrm_tls_entryLinked List.
tsrm_tls_table_sizeUsed to indicate**tsrm_tls_table.
id_countAs the id generator of global variable resources, it is globally unique and incremental.
*resource_types_tableUsed to store resources corresponding to global variables.
resource_types_table_sizeIndicates*resource_types_table.
Two key data structures are involved.tsrm_tls_entryAndtsrm_resource_type.
Typedef struct _ tsrm_tls_entry; struct _ tsrm_tls_entry {void ** storage; // The global variable array int count of the current node; // The number of global variables of the current node THREAD_T thread_id; // thread ID tsrm_tls_entry * next corresponding to the current node; // pointer to the next node}; typedef struct {size_t size; // size of the defined global variable struct ts_allocate_ctor; // constructor pointer of the defined global variables ts_allocate_dtor; // destructor pointer of the defined global variables int done;} tsrm_resource_type;
When a global variable is added,id_countWill increase by 1 (with the thread mutex lock ). Then, resources are generated based on the memory, constructor, and destructor required by global variables.tsrm_resource_type, Saved*resource_types_tableAccording to the resourcetsrm_tls_entryAdd the corresponding global variables to the node.
With this general understanding, we will carefully analyze the TSRM environment initialization and resource ID allocation to understand this complete process.
TSRM environment Initialization
In the module initialization phase, the main function of each SAPI callstsrm_startupTo initialize the TSRM environment.tsrm_startupThe function will input two very important parameters. One isexpected_threadsIndicates the expected number of threads. One isexpected_resourcesIndicates the expected number of resources. Different sapis have different initialization values, such as mod_php5 and cgi, which are a single thread and a single resource.
TSRM_API int tsrm_startup (int expected_threads, int expected_resources, int debug_level, char * debug_filename) {/* code... */Threads = expected_threads; // The expected number of threads allocated during SAPI initialization, generally 1 tsrm_tls_table = (tsrm_tls_entry **) calloc (tsrm_tls_table_size, sizeof (tsrm_tls_entry *)); /* code... */id_count = 0; resource_types_table_size = expected_resources; // The pre-allocated resource table size during SAPI initialization, which is generally 1 resource_types_table = (tsrm_resource_type *) calloc (limit, sizeof (tsrm_resource_type);/* code... */return 1 ;}
Three important tasks are simplified, including the tsrm_tls_table linked list, resource_types_table array, and id_count. These three global variables are shared by all threads, ensuring the consistency of memory management between threads.
Resource ID allocation
We know that the ZEND_INIT_MODULE_GLOBALS macro must be used to initialize a global variable (the following example of Array Extension will explain ), in practice, the ts_allocate_id function is called to apply for a global variable in a multi-threaded environment, and then return the allocated resource ID. Although the Code is relatively large, it is still relatively clear. The following annotations are provided to describe the Code:
TSRM_API maid (ts_rsrc_id * rsrc_id, size_t size, ts_allocate_ctor, ts_allocate_dtor dtor) {int I; TSRM_ERROR (condition, "Obtaining a new resource id, % d bytes ", size); // Add the multi-threaded mutex lock tsrm_mutex_lock (tsmm_mutex);/* obtain a resource id */* rsrc_id = TSRM_SHUFFLE_RSRC_ID (id_count ++ ); // Global static variable id_count plus 1 TSRM_ERROR (TSRM_ERROR_LEVEL_CORE, "Obtained resource id % D ", * rsrc_id);/* store the new resource type in the resource sizes table * // because resource_types_table_size has an initial value (expected_resources ), therefore, it is not always necessary to expand the memory if (Bytes <id_count) {resource_types_table = (tsrm_resource_type *) realloc (resource_types_table, sizeof (tsrm_resource_type) * id_count (! Resource_types_table) {tsrm_mutex_unlock (tsmm_mutex); TSRM_ERROR (failed, "Unable to allocate storage for resource"); * rsrc_id = 0; return 0;} identifier = id_count ;} // Save the size, constructor, and destructor of the global variable struct to resource_types_table in the resource_resource_type array resource_types_table [TSRM_UNSHUFFLE_RSRC_ID (* rsrc_id)]. size = size; resource_types_table [TSRM_UNSHUFFLE_RSRC_ID (* rsrc_id)]. ctor = ctor; resource_types_table [TSRM_UNSHUFFLE_RSRC_ID (* rsrc_id)]. dtor = dtor; resource_types_table [TSRM_UNSHUFFLE_RSRC_ID (* rsrc_id)]. done = 0;/* enlarge the arrays for the already active threads * // the PHP kernel will then traverse all threads for each thread's tsrm_tls_entry for (I = 0; I <tsrm_tls_table_size; I ++) {tsrm_tls_entry * p = tsrm_tls_table [I]; while (p) {if (p-> count <id_count) {int j; p-> storage = (void *) realloc (p-> storage, sizeof (void *) * id_count); for (j = p-> count; j <id_count; j ++) {// allocate the required memory space for the global variable in this thread p-> storage [j] = (void *) malloc (resource_types_table [j]. size); if (resource_types_table [j]. ctor) {// initialize the global variables stored in the p-> storage [j] address. // The second parameter of the ts_allocate_ctor function does not know why it is reserved, the entire project has not actually been used. Compared with PHP7, we found that the second parameter has indeed removed resource_types_table [j]. ctor (p-> storage [j], & p-> storage) ;}} p-> count = id_count;} p = p-> next ;}} // cancel the thread mutex lock tsrm_mutex_unlock (tsmm_mutex); TSRM_ERROR (response, "Successfully allocated new resource id % d", * rsrc_id); return * rsrc_id ;}
When you use the ts_allocate_id function to allocate a global resource ID, the PHP kernel adds a mutex lock to ensure that the generated resource ID is unique, here, the lock is used to convert concurrent content into serial content in the time dimension, because the fundamental problem of concurrency is the problem of time. After the lock is applied, the id_count auto-increment generates a resource ID. After the resource ID is generated, the storage location is allocated to the current resource ID, and each resource is stored in resource_types_table, when a new resource is allocated, A tsrm_resource_type is created. All tsrm_resource_types form the tsrm_resource_table as arrays, and their subscript is the resource ID. In fact, we can regard tsrm_resource_table as a HASH table. The key is the resource ID, and the value is the tsrm_resource_type structure (any array can be regarded as a HASH table, if the key value of the array is meaningful ).
After the resource ID is allocated, the PHP kernel will traverse all threads and allocate the memory space required for the global variable of this thread for the tsrm_tls_entry of each thread. Here, the global variable size of each thread is specified at the respective call (that is, the size of the global variable struct ). Finally, initialize the global variables stored in the address. So I drew a picture to illustrate it.
There is another confusion,tsrm_tls_tableAnd how the linked list is implemented. We will keep this issue for discussion later.
Every ts_allocate_id call, the PHP kernel will traverse all threads and allocate corresponding resources to each thread. If this operation is performed in the request processing stage of the PHP lifecycle, will it be called repeatedly?
PHP considers this situation, and ts_allocate_id is called during module initialization.
After TSRM is started, the initialization method of each extended module is traversed during module initialization. The extended global variables are declared at the beginning of the extended implementation code and initialized in the MINIT method. During initialization, TSRM will be notified of the global variables and sizes applied by TSRM. The so-called Zhihui operation is actually the ts_allocate_id function mentioned above. TSRM allocates and registers in the memory pool, and then returns the resource ID to the extension.
Use of global variables
Taking the standard Array Extension as an example, the global variable of the current extension is first declared.
ZEND_DECLARE_MODULE_GLOBALS(array)
Then, when the module initializes, it will call the global variable initialization macro to initialize the array, for example, allocating memory space.
static void php_array_init_globals(zend_array_globals *array_globals){ memset(array_globals, 0, sizeof(zend_array_globals));}/* code... */PHP_MINIT_FUNCTION(array) /* {{{ */{ ZEND_INIT_MODULE_GLOBALS(array, php_array_init_globals, NULL); /* code... */}
Both the Declaration and initialization operations distinguish between ZTS and non-ZTS.
#ifdef ZTS#define ZEND_DECLARE_MODULE_GLOBALS(module_name) \ ts_rsrc_id module_name##_globals_id;#define ZEND_INIT_MODULE_GLOBALS(module_name, globals_ctor, globals_dtor) \ ts_allocate_id(&module_name##_globals_id, sizeof(zend_##module_name##_globals), (ts_allocate_ctor) globals_ctor, (ts_allocate_dtor) globals_dtor);#else#define ZEND_DECLARE_MODULE_GLOBALS(module_name) \ zend_##module_name##_globals module_name##_globals;#define ZEND_INIT_MODULE_GLOBALS(module_name, globals_ctor, globals_dtor) \ globals_ctor(&module_name##_globals);#endif
For non-ZTS situations, directly declare the variable and initialize the variable. For ZTS, TSRM is added to the PHP kernel, instead of declaring the global variable, instead of ts_rsrc_id, during initialization, it is no longer an initialization variable. Instead, it calls the ts_allocate_id function to apply for a global variable for the current module in a multi-threaded environment and return the resource ID. The resource ID variable name consists of the module name and global_id.
If you want to call the global variable of the current extension, use: ARRAYG (v), the macro definition:
#ifdef ZTS#define ARRAYG(v) TSRMG(array_globals_id, zend_array_globals *, v)#else#define ARRAYG(v) (array_globals.v)#endif
If it is not ZTS, the attribute field of the global variable is called directly. If it is ZTS, the variable needs to be obtained through TSRMG.
TSRMG definition:
#define TSRMG(id, type, element) (((type) (*((void ***) tsrm_ls))[TSRM_UNSHUFFLE_RSRC_ID(id)])->element)
Remove the brackets. The TSRMG macro means to obtain the global variable from tsrm_ls Based on the Resource ID and return the attribute fields of the corresponding variable.
Now the problem istsrm_lsWhere did they come from?
Tsrm_ls Initialization
tsrm_lsPassts_resource(0)Initialization. Show what is actually calledts_resource_ex(0,NULL). The following describests_resource_exSome macros are available, and the threadpthreadFor example.
# Define THREAD_HASH_OF (thr, ts) (unsigned long) thr % (unsigned long) tsstatic MUTEX_T tsmm_mutex; void * handle (Consumer id, THREAD_T * th_id) {THREAD_T thread_id; int hash_value; tsrm_tls_entry * thread_resources; // The tsrm_tls_table has been initialized in tsrm_startup. if (tsrm_tls_table) {// th_id = NULL during initialization; if (! Th_id) {// The First Time is empty, pthread_setspecific has not been executed, so the thread_resources pointer is empty thread_resources = pthread_getspecpacific (tls_key); if (thread_resources) {TSRM_SAFE_RETURN_RSRC (thread_resources->, thread_resources-> count);} thread_id = pthread_self ();} else {thread_id = * th_id ;}// lock pthread_mutex_lock (tsmm_mutex); // directly obtain the remainder, use its value as the array subscript and distribute different thread hashes in tsrm_tls_table hash_value = THREAD_HASH_OF (thread _ Id, tsrm_tls_table_size); // After SAPI calls tsrm_startup, tsrm_tls_table_size = expected_threads thread_resources = tsrm_tls_table [hash_value]; if (! Thread_resources) {// if not, it is allocated. Allocate_new_resource (& tsrm_tls_table [hash_value], thread_id); // After the allocation is complete, run the following else interval return ts_resource_ex (id, & thread_id );} else {do {// match if (thread_resources-> thread_id = thread_id) {break;} if (thread_resources-> next) {thread_resources = thread_resources-> next ;} else {// if the end of the chain table is still not found, it will be allocated and connected to allocate_new_resource (& thread_resources-> next, thread_id) at the end of the chain table; return ts_resource_ex (id, & thread_id) ;}while (thread_resources) ;}tsrm_safe_return_rsrc (thread_resources-> storage, id, thread_resources-> count); // unlock pthread_mutex_unlock (tsmm_mutex );}
Whileallocate_new_resourceThe memory is allocated to the new thread in the corresponding linked list and all global variables are added to it.storagePointer array.
Static void allocate_new_resource (optional ** thread_resources_ptr, THREAD_T thread_id) {int I; (* thread_resources_ptr) = (tsrm_tls_entry *) malloc (sizeof (optional); (* thread_resources_ptr) -> storage = (void **) malloc (sizeof (void *) * id_count); (* thread_resources_ptr)-> count = id_count; (* thread_resources_ptr)-> thread_id = thread_id; (* thread_resources_ptr)-> next = NULL; // sets the local storage variable of the thread. After setting it here, go to ts_resource_ex to get pthread_setspecific (* thread_resources_ptr); if (tsrm_new_thread_begin_handler) {handler (thread_id, & (* thread_resources_ptr)-> storage ));} for (I = 0; I <id_count; I ++) {if (resource_types_table [I]. done) {(* thread_resources_ptr)-> storage [I] = NULL;} else {// Add resource_types_table resources to the newly added tsrm_tls_entry node (* thread_resources_ptr) -> storage [I] = (void *) malloc (resource_types_table [I]. size); if (resource_types_table [I]. ctor) {resource_types_table [I]. ctor (* thread_resources_ptr)-> storage [I], & (* thread_resources_ptr)-> storage) ;}} if (tsrm_new_thread_end_handler) {tsrm_new_thread_end_handler (thread_id, & (* thread_resources_ptr)-> storage);} pthread_mutex_unlock (tsmm_mutex );}
There is a knowledge point above, Thread Local Storage, and now there is a global variable tls_key, which can be used by all threads to change its value. On the surface, it seems to be a global variable. All threads can use it, and its values are stored separately in each thread. This is the significance of local thread storage. So how to implement local thread storage?
Union requiredtsrm_startup,ts_resource_ex,allocate_new_resourceExamples of functions with annotations are as follows:
// Take pthread as an example // 1. first, the tls_key global variable static pthread_key_t tls_key is defined; // 2. call pthread_key_create () in tsrm_startup to create the variable pthread_key_create (& tls_key, 0); // 3. in allocate_new_resource, the * thread_resources_ptr pointer variable is stored in the global variable tls_key using tsrm_tls_set; // pthread_setspecific (* thread_resources_ptr) after expansion; // 4. in ts_resource_ex, use tsrm_tls_get () to obtain * thread_reso set in this thread. Urces_ptr // multi-threaded concurrent operations will not affect each other. Thread_resources = tsrm_tls_get ();
After understandingtsrm_tls_tableAfter creating an array and Its linked list, let's look at it again.ts_resource_exThe return macro called in the function.
#define TSRM_SAFE_RETURN_RSRC(array, offset, range) \ if (offset==0) { \ return &array; \ } else { \ return array[TSRM_UNSHUFFLE_RSRC_ID(offset)]; \ }
Is based on the inputtsrm_tls_entryAndstorageArray subscriptoffsetAnd then return the global variable in the thread'sstorageAddress in the array. Here we can understand how to obtain a global variable macro in multiple threads.TSRMGMacro definition.
In fact, this is often used in writing extensions:
# Define TSRMLS_D void *** tsrm_ls/* Without commas (,). Generally, this parameter is used when it is a unique parameter. */# define TSRMLS_DC is used for definition and TSRMLS_D/* is used for definition, however, there are other parameters before the parameter, so a comma */# define TSRMLS_C tsrm_ls # define TSRMLS_CC, TSRMLS_C is required.
When NOTICE is writing extensions, many people may not know which one to use. We can see through macro expansion that they contain commas (,), no commas (,), and declarations and calls, in English, "D" indicates Define, while "C" indicates Comma, Comma, and "C" indicates Call.
The above is the definition in ZTS mode, and all the definitions in non-ZTS mode are empty.
References
- What is TSRMLS_CC? -54 chen
- In-depth study of the thread security model of PHP and Zend Engine
This article from: https://github.com/zhoumengkang/tipi/blob/master/book/chapt08/08-03-zend-thread-safe-in-php.markdown? Spm = 5176.100239.blogcont60787.4.Mvv5xg & file = 08-03-zend-thread-safe-in-php.markdown