PHP Source Learning Thread Safety

Source: Internet
Author: User
Tags mutex sapi zts

From the scope, the C language can define 4 different variables: Global variables, static global variables, local variables, static local variables. The following analysis of different variables is only from the perspective of the function scope, assuming that all variable declarations do not have duplicate names.
    • Global variables, declared outside the function, for example, int gVar; . Global variables, all functions are shared, and anywhere this variable name refers to this variable

    • Static global variables ( static sgVar ) are all function shares, but this will have compiler limitations, which is a function that the compiler provides

    • Local variables (functions/blocks int var; ), not shared, the variables involved in multiple executions of functions are independent of each other, and they are just different variables with duplicate names.

    • Local static variables (in Functions static int sVar; ), this function is shared between the functions of each execution of the variable involved in this is the same variable

The above scopes are scoped from the perspective of the function, which satisfies all of the variables we share in single-threaded programming. Now let's analyze the multithreading situation. In multi-threading, multiple threads share resources other than the function call stack. So the above-mentioned scopes become from the definition.

    • Global variables, all functions are shared, so all threads are shared, and different variables that appear in different threads are the same variable

    • Static global variables, all functions shared, and all threads shared

    • Local variables, which are not associated with the variables involved in the execution of this function, and are therefore not shared among the various threads

    • Static local variables, shared between the functions, each execution of the function involves the same variable, so the individual threads are shared

First, the Origin of TSRM

In a multithreaded system, the process retains the attribute of the ownership of the resource, and multiple concurrent execution flows are the threads that execute in the process. Like the worker in Apache2, the master control process generates multiple child processes, each of which contains a fixed number of threads, each of which processes the request independently. Similarly, minsparethreads and maxsparethreads set the minimum and maximum number of idle threads, and maxclients sets the total number of threads in all child processes in order not to generate the thread again when the request arrives. If the total number of threads in an existing child process does not meet the load, the control process will derive the new child process.

When PHP is running on similar multithreaded servers as above, PHP is in the multi-threaded life cycle. Within a certain amount of time, there will be multiple threads in a process space, multiple threads in the same process, the global variables initialized by the common module, and if the script is run as PHP in CLI mode, Multiple threads attempt to read and write to some common resources stored in the process memory space (such as a larger number of global variables outside of a module initialized by multiple threads).

At this point, these threads access the same memory address space, when a thread is modified, it affects other threads, which can increase the speed of some operations, but there is a large coupling between multiple threads, and when multiple threads are concurrent, there are common problems of data consistency or resource contention concurrency, For example, multiple run results are different from the results of single-threaded runs. If there are only read operations for global variables, static variables, and no write operations in each thread, these global variables are thread-safe, but this is not a realistic scenario.

To solve the thread concurrency problem, PHP introduced the TSRM: Thread Safety resource Manager (threads Safe Resource Manager). TRSM implementation code in the PHP source/TSRM directory, call everywhere, usually, we call the TSRM layer. In general, the TSRM layer will only be enabled at compile time when it is indicated (for example, Apache2+worker MPM, a thread-based MPM), because Apache under Win32 is based on multithreading, so this layer is always opened under Win32.

Second, the realization of TSRM

The process retains the property of resource ownership, threads do concurrent access, and the TSRM layer introduced in PHP focuses on access to shared resources, where shared resources are global variables shared between threads that exist in the process's memory space. When PHP is in single-process mode, when a variable is declared outside any function, it becomes a global variable.

The following are some of the most important global variables (global variables are shared in multiple threads).

/* The Memory Manager table */static tsrm_tls_entry   **tsrm_tls_table=null;static int              tsrm_tls_table_size; Static ts_rsrc_id       id_count;/* The resource sizes Table */static Tsrm_resource_type   *resource_types_table=null ; static int                  resource_types_table_size;

**tsrm_tls_tableThe full spelling thread safe Resource Manager thread local storage table, used to hold the list of individual threads tsrm_tls_entry .

tsrm_tls_table_sizeThe size used to represent **tsrm_tls_table .

id_countThe ID generator, which is a global variable resource, is globally unique and incremented.

*resource_types_tableUsed to store the resources corresponding to the global variables.

resource_types_table_sizeRepresents *resource_types_table the size.

There are two key data structures involved tsrm_tls_entry and tsrm_resource_type .

typedef struct _tsrm_tls_entry tsrm_tls_entry;struct _tsrm_tls_entry {    void **storage;//This node's global variable array    int count ///This node global variable number    thread_t thread_id;//the thread ID corresponding to this node    tsrm_tls_entry *next;//The next node pointer};typedef struct {    size_t size;//the size of the defined global variable structure    ts_allocate_ctor ctor;//The constructor of the defined global variable    ts_allocate_dtor dtor;//The method pointer of the defined global variable    int done;} Tsrm_resource_type;

When a new global variable is added, id_count it will increment by 1 (plus the thread mutex). Then the corresponding resources are generated according to the memory, constructor, and destructor required by the global variables, and then the tsrm_resource_type *resource_types_table corresponding global variables are added to all nodes of each thread according to the resource tsrm_tls_entry .

With this general understanding, the complete process is understood below by careful analysis of the initialization of the TSRM environment and the allocation of resource IDs.

TSRM initialization of the environment

During the module initialization phase, the TSRM environment is initialized by invocation in each SAPI main function tsrm_startup . The tsrm_startup function passes in two very important parameters, one that expected_threads represents the expected number of threads, and one that expected_resources represents the expected number of resources. Different SAPI have different initialization values, such as mod_php5,cgi, which are a resource for a thread.

TSRM_API int tsrm_startup (int expected_threads, int expected_resources, int debug_level, char *debug_filename) {/    * Code ... */    tsrm_tls_table_size = expected_threads;//SAPI The number of threads that are expected to be allocated when initializing, typically    1 tsrm_tls_table = (tsrm_tls_entry * *) calloc (tsrm_tls_table_size, sizeof (Tsrm_tls_entry *));    /* Code ... */    id_count=0;    Resource_types_table_size = expected_resources; SAPI The Resource table size that is pre-allocated when initializing, typically 1    resource_types_table = (Tsrm_resource_type *) calloc (resource_types_table_size, sizeof (Tsrm_resource_type));    /* Code ... */    return 1;}

Streamline the three important tasks that are done, initialize the tsrm_tls_table list, the resource_types_table array, and the Id_count. And these three global variables are shared by all threads, realizing the consistency of memory management between threads.

Allocation of resource IDs

We know that we need to use the Zend_init_module_globals macro when initializing a global variable (as explained in the example of an array extension below), and it is actually called the TS_ALLOCATE_ID function to request a global variable in a multithreaded environment. The assigned resource ID is then returned. Although the code is more, the actual is still relatively clear, the following annotated to explain:

Tsrm_api ts_rsrc_id ts_allocate_id (ts_rsrc_id *rsrc_id, size_t size, ts_allocate_ctor ctor, Ts_allocate_dtor dtor) {int    I    Tsrm_error ((Tsrm_error_level_core, "Obtaining a new resource ID,%d bytes", size));    Plus multi-threaded mutex lock tsrm_mutex_lock (Tsmm_mutex); /* Obtain a resource ID */*rsrc_id = tsrm_shuffle_rsrc_id (id_count++);    Global static variable Id_count plus 1 tsrm_error ((Tsrm_error_level_core, "obtained resource ID%d", *rsrc_id)); /* Store the new resource type in the resource Sizes table *///Because Resource_types_table_size has an initial value (expected_resources ), so it's not always necessary to expand memory if (Resource_types_table_size < id_count) {resource_types_table = (Tsrm_resource_type *) rea        Lloc (resource_types_table, sizeof (Tsrm_resource_type) *id_count);            if (!resource_types_table) {tsrm_mutex_unlock (Tsmm_mutex);            Tsrm_error (Tsrm_error_level_error, "Unable to allocate storage for resource");            *rsrc_id = 0;        return 0; } resource_types_tabLe_size = Id_count; }//The size, constructors, and destructors of the global variable structure are stored in the Tsrm_resource_type array resource_types_table resource_types_table[tsrm_unshuffle_rsrc    _id (*rsrc_id)].size = size;    resource_types_table[tsrm_unshuffle_rsrc_id (*rsrc_id)].ctor = ctor;    resource_types_table[tsrm_unshuffle_rsrc_id (*rsrc_id)].dtor = dtor;    resource_types_table[tsrm_unshuffle_rsrc_id (*rsrc_id)].done = 0; /* Enlarge the arrays for the already active threads *//PHP kernel will then traverse all threads for each thread of tsrm_tls_entry for (i=0; i<tsrm_ Tls_table_size;        i++) {Tsrm_tls_entry *p = tsrm_tls_table[i];                while (p) {if (P->count < id_count) {int J;                P->storage = (void *) realloc (p->storage, sizeof (void *) *id_count); For (j=p->count; j<id_count; j + +) {//Allocate required memory space for global variables in this thread p->storage[j] =                    (void *) malloc (resource_types_table[j].size);           if (resource_types_table[j].ctor) {             Finally, the p->storage[j] address holds the global variables initialized,//Here The second parameter of the Ts_allocate_ctor function does not know why the reservation, the entire project                    Has not been used, compared PHP7 found that the second parameter has indeed been removed Resource_types_table[j].ctor (p->storage[j], &p->storage);            }} p->count = Id_count;        } p = p->next;    }}//Cancel thread Mutex Tsrm_mutex_unlock (Tsmm_mutex);    Tsrm_error ((Tsrm_error_level_core, "successfully allocated new resource ID%d", *rsrc_id)); return *rsrc_id;}

When the global resource ID is assigned through the TS_ALLOCATE_ID function, the PHP kernel first adds a mutex to ensure that the generated resource ID is unique, where the lock function is to serialize the concurrent content in the time dimension, because the underlying problem of concurrency is the problem of time. After locking, Id_count, generates a resource ID, generates a resource ID, assigns a storage location to the current resource ID, and each resource is stored in resource_types_table, and when a new resource is allocated, it creates a tsrm_ Resource_type. All Tsrm_resource_type form an array of tsrm_resource_table, and the subscript is the ID of the resource. In fact, we can think of tsrm_resource_table as a hash table, key is the resource id,value is the TSRM_RESOURCE_TYPE structure (any array can be considered a hash table, if the value of the array key is meaningful).

After assigning the resource ID, the PHP kernel will then traverse all threads to allocate the memory space required for the thread's global variables for each thread's tsrm_tls_entry. The size of the global variables for each thread is specified at the respective invocation (that is, the size of the global variable structure). Finally, the global variables that are stored in the address are initialized. So I drew a picture to illustrate it.

There is also a confusing place where tsrm_tls_table the elements are added and how the list is implemented. We'll keep this problem for a while and we'll discuss it later.

Every time a ts_allocate_id call is made, the PHP kernel iterates through all the threads and assigns the corresponding resources to each thread, and if this is done during the request processing phase of the PHP life cycle, will it not be repeated?

PHP takes this into account, and the invocation of ts_allocate_id is called when the module is initialized.

After TSRM is started, the module initialization method for each extension is traversed during module initialization, and the extended global variables are declared at the beginning of the extended implementation code and initialized in the Minit method. It is initialized with the tsrm of the global variable and the size of the application, and the so-called known operation is actually the TS_ALLOCATE_ID function mentioned earlier. TSRM allocates and registers in the memory pool, and then returns the resource ID to the extension.

Use of global variables

As an example of a standard array extension, the current extended global variable is declared first.

Zend_declare_module_globals (Array)

The global variable initialization macro is then called when the module is initialized to initialize the array, such as allocating memory space operations.

static void Php_array_init_globals (Zend_array_globals *array_globals) {    memset (array_globals, 0, sizeof (zend_ array_globals));} /* Code ... */php_minit_function (array)/* {{*/{    zend_init_module_globals (array, php_array_init_globals, NULL);    /* Code ... */}

Both the declaration and initialization operations are distinguished by ZTS and non-zts.

#ifdef zts#define zend_declare_module_globals (module_name)                                ts_rsrc_id module_name# #_globals_id; #define Zend_ Init_module_globals (Module_name, Globals_ctor, Globals_dtor)       ts_allocate_id (&module_name# #_globals_id, sizeof (zend_# #module_name # #_globals), (Ts_allocate_ctor) Globals_ctor, (Ts_allocate_dtor) globals_dtor); #else # Define Zend_declare_module_globals (module_name)                                zend_# #module_name # #_globals module_name# #_globals; #define Zend_init_module_globals (Module_name, Globals_ctor, Globals_dtor)       globals_ctor (&module_name# #_globals); endif

For non-ZTS cases, declare variables directly, initialize variables, for ZTS case, the PHP kernel will add tsrm, no longer declare global variables, but instead of the ts_rsrc_id, initialization is no longer the initialization of variables, but call Ts_allocate_ The ID function applies a global variable to the current module in a multithreaded environment and returns the resource ID. Where the resource ID variable name consists of the module name plus global_id.

If you want to invoke the current extended global variable, use: Arrayg (v), the definition of this macro:

#ifdef Zts#define Arrayg (v) tsrmg (array_globals_id, zend_array_globals *, v) #else # define ARRAYG (v) (ARRAY_GLOBALS.V) # endif

If non-ZTS call the global variable's property field directly, and if it is zts, you need to get the variable through TSRMG.

Definition of TSRMG:

#define TSRMG (ID, type, Element) (((Type) (* ((void * *) Tsrm_ls)) [tsrm_unshuffle_rsrc_id (ID)])->element)

By removing this brace, the TSRMG macro means to get the global variable by resource ID from Tsrm_ls and return the property field of the corresponding variable.

So now the question is, tsrm_ls where does this come from?

Initialization of the Tsrm_ls

tsrm_lsby ts_resource(0) initializing. Expands the actual last call to Yes ts_resource_ex(0,NULL) . Here are ts_resource_ex some macros to expand, pthread as an example of threading.

#define THREAD_HASH_OF (thr,ts) (unsigned long) thr% (unsigned long) tsstatic mutex_t tsmm_mutex;void *ts_resource_ex (ts_    rsrc_id ID, thread_t *th_id) {thread_t thread_id;    int hash_value;    Tsrm_tls_entry *thread_resources;        Tsrm_tls_table th_id = NULL at Initialization of Tsrm_startup if (tsrm_tls_table) {//Initialize) if (!th_id) {//First null has not been performed pthread_setspecific so the thread_resources pointer is empty thread_resources = Pthread_            Getspecific (Tls_key);            if (thread_resources) {TSRM_SAFE_RETURN_RSRC (thread_resources->storage, ID, thread_resources->count);        } thread_id = Pthread_self ();        } else {thread_id = *th_id;    }}//Lock Pthread_mutex_lock (Tsmm_mutex);    Directly take the remainder, the value as an array subscript, the different thread hash distribution in tsrm_tls_table Hash_value = thread_hash_of (thread_id, tsrm_tls_table_size);  After SAPI calls Tsrm_startup, tsrm_tls_table_size = expected_threads thread_resources = Tsrm_tls_table[hash_value];  if (!thread_resources) {//If not yet, the new assignment.        Allocate_new_resource (&tsrm_tls_table[hash_value], thread_id);    After the allocation is completed, then execute to the following else interval return ts_resource_ex (ID, &thread_id);                } else {do {///along the list-by-match if (thread_resources->thread_id = = thread_id) {            Break            } if (thread_resources->next) {thread_resources = thread_resources->next; } else {//the end of the list is still not found, then the new assignment is received at the end of the list allocate_new_resource (&thread_resources->next, t                HREAD_ID);            return ts_resource_ex (ID, &thread_id);    }} while (Thread_resources);    } tsrm_safe_return_rsrc (Thread_resources->storage, ID, thread_resources->count); Unlock Pthread_mutex_unlock (Tsmm_mutex);}

and   Allocate_new_resource   allocates memory for the new thread in the corresponding linked list, and adds all the global variables to its   storage   pointer array.

static void Allocate_new_resource (Tsrm_tls_entry **thread_resources_ptr, thread_t thread_id) {int i;    (*thread_resources_ptr) = (Tsrm_tls_entry *) malloc (sizeof (tsrm_tls_entry));    (*thread_resources_ptr)->storage = (void *) malloc (sizeof (void *) *id_count);    (*thread_resources_ptr)->count = Id_count;    (*thread_resources_ptr)->thread_id = thread_id;    (*thread_resources_ptr)->next = NULL; Sets the thread-local storage variable.    After setting up here, go to TS_RESOURCE_EX to fetch pthread_setspecific (*THREAD_RESOURCES_PTR); if (Tsrm_new_thread_begin_handler) {Tsrm_new_thread_begin_handler (thread_id, & (*THREAD_RESOURCES_PTR)->st    Orage)); } for (i=0; i<id_count; i++) {if (Resource_types_table[i].done) {(*thread_resources_ptr)->sto        Rage[i] = NULL;  } else {//Add resource_types_table resources to the new Tsrm_tls_entry node (*thread_resources_ptr)->storage[i]            = (void *) malloc (resource_types_table[i].size); if (Resource_types_table[i].ctor) {resource_types_table[i].ctor (*thread_resources_ptr)->storage[i], & (*THREAD_RESOURCES_PTR)-&            Gt;storage); }}} if (Tsrm_new_thread_end_handler) {Tsrm_new_thread_end_handler (thread_id, & (*THREAD_RESOURC    ES_PTR) (->storage)); } pthread_mutex_unlock (Tsmm_mutex);}

There is a knowledge point, thread Local Storage, now has a global variable tls_key, all threads can use it, change its value. On the surface it looks like this is a global variable that all threads can use, and its value is stored separately in each thread. This is the meaning of thread-local storage. So how do you implement thread-local storage?

A union is required, and tsrm_startup ts_resource_ex the function is illustrated with allocate_new_resource a note:

Take Pthread as an example//1. The Tls_key global variable static pthread_key_t tls_key;//2 is defined first. Then call Pthread_key_create () in Tsrm_startup to create the variable pthread_key_create (&tls_key, 0); 3. In Allocate_new_resource, the *thread_resources_ptr pointer variable is stored in the global variable Tls_key by Tsrm_tls_set tsrm_tls_set (*thread_resources_ PTR);//Pthread_setspecific (*THREAD_RESOURCES_PTR) after expansion;//4. The *thread_resources_ptr//multithreaded concurrency operations set in this thread are not affected by each other in TS_RESOURCE_EX by Tsrm_tls_get ()    . Thread_resources = Tsrm_tls_get ();

After understanding the tsrm_tls_table array and the creation of the linked list, look at the ts_resource_ex return macro called in the function.

#define TSRM_SAFE_RETURN_RSRC (array, offset, range)         if (offset==0) {                                            RETURN &array;                                      } else {                                                    return array[tsrm_unshuffle_rsrc_id (offset)];       }

is based on the incoming tsrm_tls_entry and storage the array subscript offset , and then returns the address of the global variable in the thread's storage array. Here it is understood that the macro macro definition of global variables is obtained in multi-threading TSRMG .

In fact, this is often used when we write extensions:

#define TSRMLS_D void ***tsrm_ls/   * Without a comma, usually the only parameter, when defined with the */#define TSRMLS_DC, Tsrmls_d/       * is also defined, but the parameters are preceded by other parameters , so I need a comma */#define Tsrmls_c tsrm_ls#define tsrmls_cc, Tsrmls_c

NOTICE write extension may be a lot of students can not know which one, through the macro expansion we can see, they are with commas and without commas, as well as the declaration and call, then the English "D" is the representative: Define, and the following "C" is Comma, comma, the front "C" Is call.

The above is defined in ZTS mode, and its definition in non-zts mode is all empty.

Resources
    • What exactly is TSRMLS_CC? -54chen
    • In-depth study of the threading security model for PHP and Zend engine

This article is from: HTTPS://GITHUB.COM/ZHOUMENGKANG/TIPI/BLOB/MASTER/BOOK/CHAPT08/08-03-ZEND-THREAD-SAFE-IN-PHP.MARKDOWN?SPM =5176.100239.blogcont60787.4.mvv5xg&file=08-03-zend-thread-safe-in-php.markdown

PHP Source Learning Thread Safety

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.