[Zz] implementation principle and performance analysis of PHP Functions

Source: Internet
Author: User
Tags strcmp types of functions urlencode alphanumeric characters
Implementation principle and performance analysis of PHP Functions

Author: Baidu engineer hdk

Preface
In any language, a function is the most basic component unit. What are the features of PHP functions? How is function calling implemented? What is the performance of PHP functions? What are the suggestions for use? This article will analyze the principle and try to answer these questions based on the actual performance test, so as to better compile the PHP program while understanding the implementation. Some common PHP functions will also be introduced.

PHP function classification
In PHP, functions are divided horizontally into two categories: User Function and internal function ). The former is some user-defined functions and methods in the program, and the latter is various library functions provided by PHP (such as sprintf and array_push ). You can also compile the library functions through the extension method. This will be described later. User functions can be subdivided into functions and methods. These functions are analyzed and tested in this article.

PHP function implementation
How is a PHP function executed? What is the process like?
To answer this question, let's take a look at the process of executing the PHP code.

As shown in figure 1, PHP implements a typical dynamic language Execution Process: after a piece of code is obtained, after the lexical parsing, Syntax Parsing, and other stages, the source program will be translated into commands (Opcodes), and then the Zend virtual machine will execute these commands sequentially to complete the operation. PHP itself is implemented using C, so all the functions that are finally called are C functions. In fact, we can regard php as a software developed by C. It is not difficult to see from the above description that the execution of functions in PHP is also translated into Opcodes for calling. Each function call actually executes one or more commands.
Zend describes each function based on the following data structure:

typedef union _zend_function { 
zend_uchar type; /* MUST be the first element of this struct! */ 
struct { 
zend_uchar type; /* never used */ 
char *function_name; 
zend_class_entry *scope; 
zend_uint fn_flags; 
union _zend_function *prototype; 
zend_uint num_args; 
zend_uint required_num_args; 
zend_arg_info *arg_info; 
zend_bool pass_rest_by_reference; 
unsigned char return_reference; 
} common; 

zend_op_array op_array; 
zend_internal_function internal_function; 
} zend_function; 

typedef struct _zend_function_state { 
HashTable *function_symbol_table; 
zend_function *function; 
void *reserved[ZEND_MAX_RESERVED_RESOURCES]; 
} zend_function_state;

The type indicates the function type: User Function, built-in function, and overload function. Common contains basic information about a function, including the function name, parameter information, and function flag (common function, static method, and abstract method. In addition, for user functions, there is also a function symbol table that records internal variables and so on. This will be detailed later. Zend maintains a global function_table, which is a large hahs table. When a function is called, the corresponding zend_function is first found from the table based on the function name. When a function is called, the virtual opportunity determines the call Method Based on the type. The execution principle of different types of functions is different.

Built-in functions
Built-in functions are essentially true C functions. Every built-in function, PHP, after final compilation, will be expanded into a function named zif_xxxx, such as our common sprintf, zif_sprintf corresponds to the underlying layer. When Zend is executed, if it finds that it is a built-in function, it simply performs a forwarding operation.
Zend provides a series of APIS for calling, including parameter acquisition, Array Operations, and memory allocation. The parameter acquisition of built-in functions is implemented through the zend_parse_parameters method. For parameters such as arrays and strings, Zend implements shallow copy, so this efficiency is very high. It can be said that for PHP built-in functions, the efficiency is almost the same as the corresponding C function, the only one more forwarding call.
Built-in functions are dynamically loaded using the so method in PHP. You can also write the corresponding so as needed, which is also known as the extension. Zend provides a series of APIS for extension

User Functions
Compared with built-in functions, user-defined functions implemented through PHP have completely different execution processes and implementation principles. As mentioned above, we know that the PHP code is translated into an opcode for execution, and the user function is no exception. In reality, each function corresponds to a set of opcode, these commands are stored in zend_function. Therefore, the call to a user function corresponds to the execution of a set of Opcodes.
"Save local variables and implement Recursion
We know that function recursion is done through stacks. PHP also uses a similar method. Zend assigns an active symbol table (active_sym_table) to each PHP function to record the status of all local variables in the current function. All symbol tables are maintained in the form of stacks. Each time a function is called, a new symbol table is allocated to the stack. When the call ends, the current symbol table goes out of the stack. The State is saved and recursive.
Zend has been optimized for Stack maintenance. A static array with a length of N is pre-allocated to simulate the stack. This static array is often used in our own programs to simulate dynamic data structures, this method avoids memory allocation and destruction caused by each call. Zend only cleans the symbol table data at the top of the current stack at the end of the function call. Because the static array length is N, once the function call level exceeds N, the program will not experience stack overflow. In this case, Zend will allocate and destroy the symbol table, as a result, performance will be greatly reduced. In Zend, n is currently set to 32. Therefore, when writing a PHP program, it is recommended that the function call level be no more than 32. Of course, for Web applications, the depth of the function call hierarchy can be defined.
"Parameter transfer is different from that of the built-in function that calls zend_parse_params to obtain parameters. Parameters in user functions are obtained through commands. A function has several parameters corresponding to several commands. The specific implementation is the assignment of common variables. From the above analysis, we can see that, compared with the built-in functions, because the stack table is maintained by itself, and the execution of each instruction is also a C function, the performance of user functions is relatively poor, A detailed comparative analysis will be provided later. Therefore, if a function has a built-in PHP function implementation, try not to rewrite the function.

Class Method
Its execution principle is the same as that of user functions, and it is also translated into Opcodes for sequential calls. Class implementation, Zend is implemented using a Data Structure zend_class_entry, which stores some basic information related to the class. This entry has been processed during PHP compilation.
In the common of zend_function, a member is called scope, which points to the zend_class_entry of the class corresponding to the current method. For the implementation of object-oriented in PHP, I will not introduce it in detail here. In the future, I will write a special article to detail the implementation principles of object-oriented in PHP. For the function, the implementation principle of method is the same as that of function, and its performance is similar in theory. We will make a detailed performance comparison later.

Performance Comparison

"Test method compare functions with names of 1, 2, 4, 8, and 16, test and compare the number of executions per second, and determine the impact of Function Name Length on Performance

"Test results are as follows:
</

"Result Analysis
We can see from the figure that the length of the function name still has a certain impact on the performance. A function with a length of 1 and an empty function with a length of 16 have a performance difference of 1 times. Analyzing the source code is not difficult to find the cause, as described above, Zend will first query the relevant information through the function name in a global funtion_table during function calling. function_table is a hash table. Inevitably, the longer the name, the more time it takes to query. Therefore, when writing a program, it is recommended that the name of the function be too long for multiple calls.

Although the function name length has a certain impact on the performance, how big is it? This problem should still be considered based on the actual situation. If a function is complicated, it will not affect the overall performance. One suggestion is that some concise names can be used for functions that call many times and have simple functions.

Impact of number of functions on Performance

"Test Method
Test the function call in the following three environments: 1. The program contains only one function. 2. The program contains 100 functions. 3. The program contains 1000 functions. Test the number of function calls per second in these three cases

"Test results are as follows:

"Result Analysis
From the test results, we can see that the performance is almost the same in these three cases. When the number of functions increases, the performance decreases slightly and can be ignored. From the implementation principle analysis, the only difference between several implementations is the function acquisition part. As described above, all functions are placed in a hash table, and the search efficiency should be close to O (1) under different numbers, so the performance gap is not big.
Consumption of different types of function calls
"Test Method
Choose one of user functions, class methods, static methods, and built-in functions. The function itself does nothing and returns directly. It mainly tests the consumption of empty function calls. The test result is the number of executions per second. In this test, the length of all function names is the same.
"Test results are as follows:

"Result Analysis
The test results show that the efficiency of PHP functions compiled by users is about 280 W/s, regardless of the type. As we expected, even the built-in functions of air conditioners are much more efficient, reaching 780 w/s, which is three times the former. It can be seen that the overhead of built-in function calls is much lower than that of user functions. From the previous analysis, we can see that the main difference lies in the operations such as initializing the symbol table and receiving parameters when the user functions are called.

Performance Comparison between built-in functions and user functions

"Test Method
The Performance Comparison between built-in functions and user functions. Here we select several common functions and use PHP to implement functions with the same function for performance comparison. In the test, we select a typical string, a mathematical string, and an array for comparison. These functions are string truncation (substr), decimal to decimal (decbin), calculates the minimum value (min) and returns the key (array_keys) in the array ).
"Test results are as follows:

"Result Analysis
The test results show that, as we expected,The overall performance of built-in functions is much higher than that of common functions.. Especially for functions involving string operations, the gap has reached one order of magnitude. Therefore, if a function has a built-in function, try to use it instead of writing the PHP function by yourself. For some functions that involve a large number of string operations, you can consider using extensions to improve performance. For example, common rich text filtering.

Performance Comparison with c Functions

"Test Method
We select three types of functions for string operation and arithmetic operation for comparison. php is implemented with extension. The three functions are simple algorithm operations, string comparison, and multiple algorithm operations. In addition to the two types of functions, the system also tests the performance after removing the function air-conditioning overhead. On the one hand, it compares the performance differences between the two functions (C and PHP built-in, in addition, the consumption test points of the air conditioning function are verified on the side to consume 10 million operations.
"Test results are as follows:

"Result Analysis
The overhead of built-in functions and C functions is relatively small after the impact of PHP function air-conditioning is removed. As function functions become more complex, the performance of both sides approaches the same. This is also easily demonstrated from the previous implementation analysis of functions. After all, the built-in functions are implemented by C. The more complex the function functions, the smaller the performance gap between C and PHP. Compared with C, the PHP function call overhead is much larger, which affects the performance of simple functions. Therefore, functions in PHP should not be nested and encapsulated too deeply.

Pseudo functions and their performance

In PHP, there are some such functions, which are used in standard function usage, but the underlying implementation is completely different from the real function call, these functions do not belong to any of the three functions mentioned above. Their essence is a separate opcode, which is estimated to be pseudo functions or directive functions.

As mentioned above, pseudo functions use the same features as standard functions. However, during the final execution, Zend reflected a corresponding command (opcode) for calling, so its implementation is closer to operations such as if, for, and arithmetic operations.
"Pseudo functions in PHP
Isset
Empty
Unset
Eval
We can see from the above introduction that the pseudo functions are directly translated into commands for execution, which reduces the overhead of calling the functions than normal functions, so the performance will be better. The following test is used for comparison. Both array_key_exists and isset can determine whether a key in the array exists and check their performance.

As shown in the figure, the isset performance is much higher than array_key_exists, which is about 4 times higher than that of array_key_exists, and its performance is about twice higher than that of empty function calls. This also proves that the PHP function call overhead is still relatively large.

Common PHP Functions

Count
Count is a function that we often use. Its function is to return the length of an array.
What is the complexity of the count function? A common saying is that the count function traverses the entire array and finds the number of elements. Therefore, the complexity is O (n ). Is that true? Let's go back to the implementation of count. We can see from the source code that the final path of the function for the Count operation on the array is zif_count-> php_count_recursive-> zend_hash_num_elements, the behavior of zend_hash_num_elements is return HT-> nnumofelements. It can be seen that this is an O (1) operation instead of an O (n) operation. In fact, an array is a hash_table at the bottom layer of PHP. For a hash table, the Zend has an element nnumofelements that records the number of current elements, therefore, this value is actually directly returned for general count. Therefore, we can conclude that:
Count is the complexity of O (1) and has nothing to do with the size of a specific array.
What is the count behavior for non-array variables? If no variable is set, 0 is returned, and 1 is returned for variables such as int, double, and string.

Strlen
Strlen is used to return the length of a string. So what is his implementation principle? We all know that strlen is an O (n) function in C. it traverses the string sequentially until \ 0 is encountered and then returns the length. Is the same in PHP? The answer is no. Strings in PHP are described in a composite structure, including pointers to specific data and string lengths (similar to strings in C ++ ), therefore, strlen directly returns the string length, which is a constant-level operation. In addition, if a variable of the non-string type is called strlen, it will first forcibly convert the variable into a string and then calculate the length.

Isset and array_key_exists
The most common use of these two functions is to determine whether a key exists in an array. However, the former can also be used to determine whether a variable has been set. As mentioned above, isset is not a real function, so it is much more efficient than the latter. It is recommended to replace array_key_exists.
Array_push and array []
Both append an element to the end of the array. The difference is that the former can push multiple instances at a time. The biggest difference between them is that the function and the language structure make the latter more efficient. Therefore, if it is only a common append element, we recommend that you use array [].

Rand and mt_rand
Both provide the random number generation function, and the former uses the libc standard Rand. The latter uses known features in Mersenne Twister as a random number generator, which can generate a random value with an average speed four times faster than the rand () provided by libc. Therefore, if you have high performance requirements, you can use mt_rand to replace the former. As we all know, Rand generates pseudo-random numbers. In C, you need to use srand to display the specified seed. However, in PHP, Rand will help you call srand once by default. Generally, you do not need to display the call. Note that in special cases, you must call srand together. That is to say, srand for Rand, mt_srand corresponds to srand, must not be used together, otherwise it is invalid.

Sort and usort
Both are used for sorting. The difference is that the former can specify the sorting policy, similar to the qsort in C and the sort in C ++. In terms of sorting, both adopt standard fast sorting. If you have sorting requirements, you can call the methods provided by PHP in special cases. You do not have to implement them again, efficiency will be much lower. For the reason, see the preceding analysis and comparison of user functions and built-in functions.

Urlencode and rawurlencode
Both are used for URL encoding. All non-alphanumeric characters except-_. In the string will be replaced with a semicolon (%) followed by two hexadecimal numbers. The only difference between the two is that for space, urlencode will be encoded as +, while rawurlencode will be encoded as % 20. In general, except for the search engine, our policies all adopt Space Encoding As % 20. Therefore, the latter is mostly used. Note that the encode and decode series must be used together.

Strcmp Functions
These functions include strcmp, strncmp, strcasecmp, and strncasecmp. The implementation functions are the same as those of C functions. However, there are also differences. Because PHP Strings allow \ 0 to appear, memcmp series rather than strcmp are used at the underlying layer during judgment, which is faster theoretically. In addition, because PHP can directly obtain the string length, this check will be performed first, and the efficiency will be much higher in many cases.

Is_int and is_numeric
These two functions are similar and not identical, so you must pay attention to their differences when using them. Is_int: determines whether a variable type is an integer type. php variables have a field to indicate the type. Therefore, you can directly determine this type. is_numeric is an absolute O (1) operation: determines whether a variable is an integer or a numeric string. That is to say, except for an integer variable, true is returned. For a string variable, such as "1234" and "1e4", true is also returned. At this time, the system will traverse the string for determination.

Summary and Suggestions

Summary:
Through the analysis of function implementation principles and performance tests, we summarize the following conclusions:
1. php function call overhead is relatively large.
2. function-related information is stored in a large hash_table. During each call, the function name is searched in the hash table. Therefore, the function name length also affects the performance.
3. The reference returned by the function has no practical significance.
4. built-in PHP functions provide much higher performance than user functions, especially for string operations.
5. Class methods, common functions, and static methods have almost the same efficiency, with no big difference
6. Apart from the impact of empty function calls, the performance of built-in functions is similar to that of C functions of the same function.
7. All parameter transfer uses the reference count of the shortest copy, the cost is very small.
8. The impact of the number of functions on performance is negligible.

Suggestion:

Therefore, we have the following suggestions for using PHP functions:
1. A function can be completed using built-in functions. Try to use it instead of writing PHP functions by yourself.
2. If a feature has high performance requirements, you can consider using extensions.
3. php function calls are costly and therefore should not be overly encapsulated. Some functions, if you need to call a lot of times itself and only use 1 or 2 lines of code to implement it, we recommend that you do not encapsulate the call.
4. Do not be overly infatuated with various design patterns. As described in the previous article, excessive encapsulation may lead to performance degradation. Consider the trade-off between the two. PHP has its own characteristics, so it is not feasible to follow the java mode too much.
5. functions should not be nested too deep, so be cautious when using recursion.
6. the pseudo function has high performance, and the implementation of the same function is preferred. For example, replace array_key_exists with isset.
7. The reference returned by the function does not make much sense and does not play a practical role. We recommend that you do not consider it.
8. The efficiency of the class member method is not lower than that of common functions, so you don't have to worry about performance loss. We recommend that you consider static methods for better readability and security.
9. If this is not a special requirement, we recommend that you use pass-through instead of pass-through for parameter passing. Of course, if the parameter is a large array and needs to be modified, you can consider the reference transfer.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.