The implementation principle and performance analysis of PHP function

Source: Internet
Author: User
Tags sprintf strcmp types of functions urlencode alphanumeric characters zend

Objective

In any language, a function is the most basic constituent unit. What are the features of PHP functions? How is a function call implemented? What is the performance of PHP functions and what are the suggestions for using them?
This paper will analyze the actual performance test from the principle and try to answer these questions, and better write the PHP program while understanding the implementation. Some common PHP functions are also introduced.

Classification of PHP functions

In PHP, the functions are divided into two main categories: User function and internal function (built-in functions).
The former is the user in the program to customize some of the functions and methods, the latter is the PHP itself provides a variety of library functions (such as sprintf, Array_push, etc.).
The user can also write library functions by extending the method, which is described later.
For the user function, which can be subdivided into functions (function) and method (class method), the three kinds of functions are analyzed and tested separately in this article.

Implementation of PHP functions

How does a PHP function ultimately execute, and what is the process like?

To answer this question, let's take a look at the process through which the PHP code executes.

As you can see from Figure 1, PHP implements a typical dynamic language execution process: After getting a piece of code, after the lexical parsing, parsing and other stages,
The source program is translated into instructions (opcodes), and then the Zend virtual machine executes the instructions sequentially. PHP itself is implemented in C,
So the final call is also the C function, in fact, we can think of PHP as a C developed software.
It is easy to see from the above that the execution of functions in PHP is also translated into opcodes to invoke, and each function call actually executes one or more instructions.

For each function, Zend is described by the following data structure

typedef Union _zend_function {    Zend_uchar type;    /* must is the first element of this struct! */    struct {        Zend_uchar type;  /* never used */        char *function_name;        Zend_class_entry *scope;        Zend_uint fn_flags;        Union _zend_function *prototype;        Zend_uint Num_args;        Zend_uint Required_num_args;        Zend_arg_info *arg_info;        Zend_bool pass_rest_by_reference;        unsigned char return_reference;    } Common;    Zend_op_array Op_array;    Zend_internal_function internal_function;} zend_function;typedef struct _zend_function_state {    HashTable *function_symbol_table;    Zend_function *function;    void *reserved[zend_max_reserved_resources];} Zend_function_state;
Where type identifies the type of function: User function, built-in function, overloaded function. Common contains basic information about functions, including function names, parameter information,

Zend maintains a global function_table, which is a large hahs table. A function call will first find the corresponding zend_function from the table based on the function name.
Different types of functions, and their execution principles are not the same
Built-in functions

Built-in functions, which are essentially real C functions, each built-in function, PHP will be expanded after the final compilation into a function called zif_xxxx, such as our common sprintf,
Correspondence to the bottom is zif_sprintf. Zend when executing, if found to be built-in functions, it is simply a forwarding operation.

Zend provides a series of APIs for invocation, including parameter fetching, array manipulation, memory allocation, and so on. The parameters of the built-in functions are obtained through the Zend_parse_parameters method.
For parameters such as arrays, strings, and so on, Zend implements a shallow copy, so this efficiency is very high. It can be said that, for PHP built-in functions, its efficiency and the corresponding C function is almost the same,
The only one more forwarding call.

Built-in functions are dynamically loaded in PHP through so, and users can write their own so, which is what we often say, as an extension.
Zend provides a range of APIs for extended use

User functions

Compared with built-in functions, user-defined functions implemented by PHP have completely different execution and implementation principles. As mentioned earlier,
We know that PHP code is translated into a opcode to execute, the user function is no exception, in practice each function corresponds to a set of opcode,
This set of instructions is stored in the zend_function. Thus, the invocation of the user function is ultimately the execution of the corresponding set of opcodes.

    • The preservation of local variables and the realization of recursion
      We know that function recursion is done through the stack. In PHP, a similar approach is used. Zend assigns an Activity symbol table (active_sym_table) to each PHP function,
      Records the state of all local variables in the current function. All symbol tables are maintained in the form of stacks, and each time a function call is assigned, a new symbol table is allocated to the stack.
      When the call ends, the current symbol table is out of the stack. This enables the preservation and recursion of the state.

For stack maintenance, Zend is optimized here. A static array of length n is pre-allocated to simulate the stack,
This method of simulating dynamic data structures by static arrays is often used in our own programs, which avoids the memory allocation and destruction of each invocation.
Zend simply clean off the symbol table data at the top of the current stack at the end of the function call.
Because the static array length is n, once the function call level exceeds n, the program does not appear stack overflow, in this case Zend will be the symbol table allocation, destruction, which will result in a lot of performance degradation.
In Zend, the current value of N is 32. Therefore, when we write PHP programs, the function call hierarchy is best not more than 32.
Of course, if it is a Web application, it can call the depth of the hierarchy itself.

    • Passing of parameters
      And the built-in function calls Zend_parse_params to get the parameters, the parameters in the user function are obtained by instruction. A function has several parameters that correspond to several instructions.
      Specific to the implementation is the normal variable assignment.
      As can be seen from the above analysis, compared with the built-in function, because it is the maintenance of the stack table, and the execution of each instruction is a C function, the performance of the user function is relatively poor,
      There will be a specific comparative analysis later. Therefore, if a function has a corresponding PHP built-in function implementation, try not to re-write the function to implement.
Class method

The class method is executed in the same way as the user function, and is also translated into opcodes sequential invocation. Class implementation, Zend with a data structure zend_class_entry to achieve,
It holds some basic information about the class. This entry is already processed when PHP is compiled.

In Zend_function's common, there is a member called scope, which points to the zend_class_entry of the current method's corresponding class. About object-oriented implementations in PHP,
This is not a more detailed introduction, in the future will be dedicated to writing an article detailing the principle of object-oriented implementation in PHP. In terms of the function, the method implements the same principle and function exactly,
In theory, the performance is similar, we will do the detailed performance comparison later.

Effect of performance contrast function name length on performance
    • Test method
      Compare the functions of 1, 2, 4, 8, 16, and test the number of executables per second to determine the effect of function name length on performance
    • Test results such as

    • Results analysis
      As can be seen from the figure, the length of the function name has a certain effect on performance. A function with a length of 1 and an empty function call with a length of 16 has a performance difference of 1 time times.
      Analysis of the source code is not difficult to find the reason, as described in the previous statement, when the function call Zend will first in a global funtion_table through the function name query related information,
      Function_table is a hash table. Inevitably, the longer the name, the more time it takes to query.
      Therefore, in the actual writing program, the function of multiple calls, the name is not recommended too long

Although function name length has some effect on performance, how big is it? The problem should still be considered in the context of the actual situation, if a function itself is more complex,
The effect on overall performance is small.
One suggestion is that for those functions that will be called many times, their functions are relatively simple, and some concise names can be properly taken.

The effect of the number of functions on performance
    • Test method
      Function call test in the following three environments, analysis results: 1. The program contains only 1 function 2. The program contains 100 functions 3. The program contains 1000 functions.
      Test the number of functions that can be called per second in these three cases
    • Test results such as

    • Results analysis
      From the test results, it can be seen that the performance is almost the same in these three cases, and the performance decrease is negligible when the number of functions increases.
      From the realization principle analysis, the only difference in several implementations is the part that the function obtains. As mentioned earlier, all functions are placed in a hash table,
      The search efficiency in different numbers should be close to O (1), so the performance gap is small.
Different types of function call consumption
    • Test method
      Choose User function, class method, static method, built-in function each one, the function itself does not do anything, direct return, main test empty function call consumption.
      The test result is the number of executables per second
      All function names have the same length in the test to remove other effects
    • Test results such as

    • Results analysis
      Through the test results can be seen, for the user to write their own PHP functions, regardless of which type, its efficiency is similar, are around 280w/s. As we expect, even air conditioners,
      Built-in functions are also much more efficient, reaching 780w/s, which is 3 times times the former. It can be seen that the cost of built-in function calls is much lower than the user function.
      From the previous principle analysis, we know that the main difference lies in the initialization of the symbol table, receiving parameters and other operations when the user function is called.
Performance comparison of built-in functions and user functions
    • Test method
      The performance comparison of built-in functions and user functions, here we select a few common functions, and then use PHP to implement the same function of the performance comparison.
      In the test, we select a string, a mathematical, an array of each of the typical comparison, these functions are string interception (substr), 10 binary to 2 (decbin),
      The minimum Value (min) and the returned array are therefore key (Array_keys).
    • Test results such as

    • Results analysis
      As you can see from the test results, the built-in functions, as we expect, are much higher in overall performance than ordinary user functions. Especially for functions that involve the manipulation of string classes,
      The gap reached 1 orders of magnitude. Therefore, one of the principles used by the function is that if a function has a corresponding built-in function, try to use it instead of writing PHP functions yourself.
      For some features that involve a lot of string manipulation, you might want to consider scaling to improve performance. such as the common rich text filter.
and C function Performance comparison
    • Test method
      We select the string manipulation and arithmetic operations of each of the 3 functions of the comparison, PHP with the extension implementation. Three functions are simple one-time algorithm operation, string comparison and multiple algorithm operation.
      In addition to its own two types of functions, it will also test the performance of the cost of the function of air conditioning, on the one hand, compared to the two functions (C and PHP built-in) itself performance differences,
      The other side confirms the consumption of the air conditioning function
      Test points are time consuming to perform 10w operations
    • Test results such as

    • Results analysis
      The overhead of the built-in functions and the C-functions is less than the effect of the PHP function, and as functions become more complex, the performance of the two sides approaches the same.
      This is also easily demonstrated in the previous function implementation analysis, after all, the built-in function is C implementation.
      The more complex the function, the smaller the performance gap between C and PHP
      In contrast to C, PHP function calls are much more expensive, and performance has a certain effect on simple functions. Therefore, PHP functions should not be nested too deep in the package.
Pseudo function and its performance

In PHP, there are functions that are standard function usages in use, but the underlying implementations are completely different from the real function calls, which are not part of any of the three functions mentioned in the preceding article.
Its essence is a separate opcode, which is estimated to be called pseudo function or instruction function.

As stated above, pseudo-functions are used with standard functions and appear to have the same characteristics. But they were eventually executed by the Zend reflected in a corresponding instruction (opcode) to invoke,
So its implementation is closer to operations such as if, for, arithmetic operations.

    • Pseudo-Functions in PHP
      Isset
      Empty
      unset
      Eval

As can be seen from the above, the pseudo-function is executed by direct translation into instruction, and the overhead of a function call is less than the normal function, so performance is better.
Let's make a comparison by testing the following. Both Array_key_exists and isset can tell if a key exists in the array and see their performance

As can be seen from the figure, compared with array_key_exists, the isset performance is much higher, basically is about 4 times times the former, and even if it is compared with the empty function call,
The performance is also about 1 time times higher. This also proves that the cost of PHP function calls is still relatively large.

Common PHP functions Implementation and introduction count

Count is a function that we often use, and its function is to return the length of an array.

What is the complexity of the count function?
A common argument is that the Count function traverses the entire array and then evaluates the number of elements, so the complexity is O (n). Is that the reality?
We return to the implementation of count to see, through the source can be found, for the count operation of the array, the final path of the function is zif_count-> php_count_recursive-> zend_hash_num_elements, While Zend_hash_num_elements's behavior is return ht->nnumofelements, it is visible that this is an O (1) operation instead of O (n).
In fact, the array at the bottom of PHP is a hash_table, for the hash table, Zend has a special element nnumofelements record the current number of elements,
Therefore, the value is actually returned directly for the general count. Thus, we conclude that count is the complexity of O (1), independent of the size of the specific array.

Variables of non-array type, what is the behavior of count?
Returns 0 for a variable that is not set, and 1 for an int, double, string, and so on

Strlen

The strlen is used to return the length of a string. So, how does his principle of implementation work?
We all know that in C, Strlen is an O (n) function that iterates through a string until it encounters a. Is this also true in PHP? The answer is no,
The string in PHP is described in a composite structure, including pointers to specific data and string lengths (similar to strings in C + +).
Therefore, strlen directly returns the string length, which is a constant-level operation.
In addition, calling strlen for a variable of a non-string type, it is important to note that it first casts the variable to a string and then asks for a length.

Isset and Array_key_exists

The most common use of these two functions is to determine whether a key exists in the array. But the former can also be used to determine if a variable has been set.
As mentioned earlier, Isset is not a real function, so its efficiency is much higher than the latter. It is recommended to replace Array_key_exists.

Array_push and array[]

Both are appending an element to the tail of the array. The difference is that the former can push multiple at a time. Their biggest difference is that one function is a language structure,
The latter is therefore more efficient. Therefore, if it is just an ordinary append element, it is recommended to use array [].

Rand and Mt_rand

Both provide the ability to generate random numbers, the former using the LIBC standard rand. The latter uses the known characteristics of the Mersenne Twister as a random number generator,
It can produce random values at an average speed of four times times faster than the rand () provided by LIBC. Therefore, if the performance requirements are high, consider replacing the former with Mt_rand.
As we all know, Rand produces pseudo-random numbers, and in c it is necessary to display the specified seed with Srand. But in PHP, Rand will help you by default calling a Srand,
In general, you do not need to display the call yourself again.
It is important to note that if you need to call Srand in special cases, make sure that you call the package. That is to say srand for Rand,mt_srand correspondence Srand, must not mix use, otherwise is invalid.

Sort and Usort

Both are used for sorting, but the former can specify a sort strategy, similar to the qsort and C + + sort in our C.
In the order of the two are implemented by the standard fast, for the ordering requirements, such as non-special cases call PHP to provide these methods can be, do not have to re-implement again, the efficiency is much lower.
The reason for this is the analysis of the user function and the built-in function in the previous article.

UrlEncode and Rawurlencode

Both are used for URL encoding, except-_ in the string. All non-alphanumeric characters are replaced with a percent (%) followed by a two-digit hexadecimal number.
The only difference between the two is that for spaces, UrlEncode is encoded as +, and Rawurlencode is encoded as%20.
In general, in addition to search engines, our strategy is to encode the space as%20. So the latter is the majority of the use.
Note that the encode and decode series must be used as a companion.

STRCMP Series functions

The functions of this series include strcmp, strncmp, strcasecmp, strncasecmp, and the same implementation function as the C function. But there are also differences,
Since PHP's string is allowed to occur, the underlying use of the MEMCMP series instead of the strcmp when judging is faster in theory.
In addition, because PHP directly can get to the length of the string, so the first check in this respect, in many cases, the efficiency is much higher.

Is_int and Is_numeric

These two functions are functionally similar and not identical, and they must be used with the same attention to their differences.
Is_int: To determine whether a variable type is an integer type, the PHP variable is specifically a field characterization type, so directly determine the type can be an absolute O (1) operation
Is_numeric: Determines whether a variable is an integer or a numeric string, that is, in addition to the integer variable returns true, for string variables, if the shape is "1234",
"1e4" will also be sentenced to true. This time the string will be traversed to determine.

Summary and Suggestions

Through the principle analysis and performance test of function realization, we summarize the following conclusions

1. PHP has a relatively expensive function call.

2. Function-related information is stored in a large hash_table, each time it is called by the function name in the hash table, so function name length has a certain effect on performance.

3. function return reference has no practical meaning

4. Built-in PHP functions are much higher performance than user functions, especially for string class operations.

5. class methods, normal functions, static methods are almost the same efficiency, not much difference

6. Except for the effect of empty function calls, the functions of the built-in function and the same function C function basically similar.

7. All parameter passing is a shallow copy of the reference count, at a very low cost.

8. The performance impact of the number of functions can be almost ignored

Therefore, for the use of PHP functions, there are some suggestions

1. A function can be done with built-in functions, try to use it instead of writing PHP functions yourself.

2. If a feature has high performance requirements, consider extending it.

3. PHP function calls are expensive, so do not encapsulate them too much. Some features, if you need to call a lot of itself and only 1, 2 lines of code on the line implementation, it is recommended not to encapsulate the call.

4. Do not indulge in a variety of design patterns, as described in the previous article, excessive encapsulation will bring performance degradation. The tradeoff between the two needs to be considered. PHP has its own characteristics,
Do not parody, too much to emulate the Java model.

5. Functions should not be nested too deep, recursive use to be cautious.

6. The performance of pseudo-function is high, and the same function is given priority. Like using Isset instead of array_key_exists.

7. The function return reference does not make much sense and does not make a practical difference, and it is recommended not to be considered.

8. Class member methods are less efficient than normal functions, so there is no need to worry about performance loss. It is recommended to consider static methods, which are more readable and more secure.

9. In the case of special needs, parameter passing suggests using a pass-through instead of a reference. Of course, reference passing can be considered if the parameter is a large array and needs to be modified.

The implementation principle and performance analysis of PHP function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.