Article from: http://www.aintnot.com/2016/02/10/understanding-phps-internal-function-definitions-ch
Original: https://nikic.github.io/2012/03/16/Understanding-PHPs-internal-function-definitions.html
Welcome to the second part of the "PHP Source for PHP developers" series.
In the previous article, Ircmaxell explained where you can find the source code for PHP, its basic directory structure, and simply introduces some C languages (because PHP is written in C). If you miss the article, maybe you should read it before you start reading this article.
In this article, we are talking about defining the internal functions of PHP and how they are understood.
How to find the definition of a function
As a start, let's try to find out the definition of the Strpos function.
The first step is to go to the PHP 5.4 root directory and enter Strpos in the search box at the top of the page. The result of the search is a large list of locations that strpos
appear in the PHP source code.
Because this result is not very helpful to us, we use a little trick: we search for "php_function strpos" (Don't miss the double quotes, they are important), not strpos
.
Now we get two entry links:
/PHP_5_4/ext/standard/
php_string.h 48 PHP_FUNCTION(strpos);
string.c 1789 PHP_FUNCTION(strpos)
The first thing to note is that two locations are in the ext/standard
folder. This is what we want to find because the Strpos function (like most string,array and file functions) is part of the standard extension.
Now open two links on the new tab and see what code is hidden behind them.
You will see the first link to take you to the Php_string.h file, which contains the following code:
// ... php_function (Strpos); Php_function (Stripos); Php_function (Strrpos); Php_function (Strripos); Php_function (STRRCHR); Php_function (SUBSTR); // ...
This is the appearance of a typical header file (a file ending with an. h suffix): A simple list of functions, defined elsewhere. In fact, we are not interested in these, because we already know what we are looking for.
The second link is more interesting: it takes us to the string.c
file, this file contains the actual source code of the function.
Before I take you step-by-step through this function, I recommend that you try to understand this function yourself. This is a very simple function, although you do not know the real details, but most of the code looks very clear.
Skeleton of PHP function
All PHP functions use the same basic structure. The variables are defined at the top of the function, and then the function is called zend_parse_parameters
, and then the main logic is RETURN_***
php_error_docref
called.
So let's start with the definition of the function:
Zval **NULL; char needle_char[2];long = 0; int Haystack_len;
The first line defines a zval
pointer to a point needle
. Zval is the definition of any PHP variable that is represented inside PHP. What it really is will be discussed in the next article.
The second line defines a pointer to a single character haystack
. At this point, you need to remember that in C language, arrays represent pointers to their first elements. For example, a haystack
variable will point to the $haystack
first character of a string variable that you pass. haystack + 1
will point to the second character, haystack + 2
point to the third, and so on. Therefore, by incrementing the pointer one by one, you can read the entire string.
So the question is, PHP needs to know where the string ends. Otherwise, it will always increment the pointer without stopping. To solve this problem, PHP also preserves a definite length, which is the haystack_len
variable.
Now, in the definition above, we are interested in the offset variable, which is used to hold the third argument of the function: The offset to start the search. It is defined with a long, which, like int, is an integer data type. Now the difference between the two is not important, but what you need to know is that in PHP, integer values are stored using long, and the length of the string is stored using int.
Now take a look at the following three lines:
if " sz|l ", &haystack, &haystack_len, &needle, &offset) = = FAILURE) {return ;}
What these three lines of code do is get the arguments passed to the function and store them in the variables declared above.
The first argument passed to a function is the number of passed arguments. This number is ZEND_NUM_ARGS()
provided via a macro.
The next function is a TSRMLS_CC
macro, which is a feature of PHP. You will find this strange macro scattered in many parts of the PHP code base. is part of the Thread Safety Resource Manager (TSRM), which guarantees that PHP does not clutter variables between threads. This is not very important to us, when you see TSRMLS_CC
(or) in the code TSRMLS_DC
, ignore it. (There is a strange place to note that there is no comma before "argument".) This is because whether or not you use thread-safe to create a function, the macro is interpreted as empty or Trsm_ls. Therefore, commas are part of the macro. )
Now, we come to the important thing: the "sz\|l" string marks the parameters that the function receives. :
s // The first parameter is the string z // The second parameter is a zval struct, any variable | // identifies the next parameter is optional l // The third parameter is a long type (integer)
In addition to s,z,l, there are more types of identities, but most of them can be clearly understood from the characters. For example, B is boolean,d is double (floating-point number), A is array,f is callback (function), O is object.
The next argument, the &haystack
&haystack_len
variable that specifies the parameter that &needle
&offset
needs to be assigned. As you can see, they are all passed by reference (&), meaning that they are not passing the variables themselves, but rather pointing to their pointers.
After this function is called, it haystack
will contain the haystack string, which is the length of the haystack_len
string, needle is the value of needle, offset is the starting offsets.
Furthermore, this function uses failure (which occurs when you try to pass an invalid argument to a function, such as passing an array assignment to a string) to check. In this case zend_parse_parameters
, the function throws a warning, and this function returns immediately (returns NULL to the user layer Code of PHP).
After the parameter parsing is complete, the main function body begins:
if 0 | | Offset > haystack_len) { "offset not contained in string"); Return_false;}
What this code does is obvious, if offset is out of bounds, a e_warning level error is thrown through the Php_error_docref function, and the function returns false using the Return_false macro.
php_error_docref
is a wrong function, you can find it in the extension directory (for example, ext folder). Its name is defined by its return to the document reference in the error page (which is the function that does not work properly). There is also a zend_error
function, which is mainly used by Zend engine, but also often appears in the extension code.
All two functions use the sprintf function, such as formatting information, so the error message can contain placeholders, which are populated with subsequent arguments. Here's an example:
" Failed to write%d bytes to%s " , Z_STRLEN_PP (TMP), filename); // %d is filled with Z_STRLEN_PP (TMP) // %s is filled with filename
Let's continue to parse the code:
if (z_type_p (needle) = = is_string) {if (! Z_strlen_p (needle)) { "Empty delimiter"); Return_false; } = Php_memnstr (haystack + offset, z_strval_p (needle), z_strlen_p (needle), + haystack _len);}
The previous 5 lines are very clear: this branch will only execute if needle is a string, and will throw an error if it is empty. And then to the more interesting part: php_memnstr
called, this function does the main work. As always, you can click the function name and view its source code.
php_memnstr
Returns a pointer to the position where the needle first appears in haystack (this is why the found variable is defined as char *, for example, a pointer to a character). As you can see from here, the offset can be simply computed by subtraction and can be seen at the end of the function:
Return_long (Found-haystack);
Finally, let's take a look at the branch when needle as a non-string:
Else { if (Php_needle_char (needle, Needle_char tsrmls_cc)! = SUCCESS) { return_false; } needle_char[10; = Php_memnstr (haystack + offset, Needle_char, 1, + Haystack_ len);}
I only refer to the manual that says "If needle is not a string, it will be converted to an integer and treated as a character order value." "This basically means that, in addition to writing strpos($str, ‘A‘)
, you can also write strpos($str, 65)
because the A-character encoding is 65.
If you look at the definition of a variable again, you can see that it is needle_char
defined as char needle_char[2]
a string of two characters that will be the php_needle_char
real character (here is ' A ') to needle_char[0]. The Strpos function then sets needle_char[1] to 0. The reason behind this is because, in C, the string is used to end with '% ', that is, the last character is set to NUL (the character encoded as 0). In PHP's syntax environment, such a situation does not exist, because PHP stores all the length of the string (so it does not need to help find the end of the string), but in order to ensure compatibility with the C function, or in the internal implementation of PHP.
Zend functions
I'm tired of strpos this function, let's find another function: strlen. We used the previous method:
Start searching for strlen from the PHP5.4 source root directory.
You will see a bunch of unrelated functions used, so search for "php_function strlen". When you do this search, you will find something strange happen: no results.
The reason is that strlen is a small number of functions defined by Zend engine and not by the PHP extension. In this case, the function does not use PHP_FUNCTION(strlen)
the definition, but instead ZEND_FUNCTION(strlen)
. Therefore, we also want to search "Zend_function strlen".
We all know that we need to click the link without the end of the semicolon to jump to the definition of the source code. This link takes us to the following function definition:
zend_function (strlen) { char *s1; int S1_len; if " s ", &s1, &s1_len) = = FAILURE) {return; } Retval_long (S1_len);}
The implementation of this function is too simple, I do not think I need further explanation.
Method
We'll talk about how classes and objects work in more detail in other articles, but as a little spoiler: you can search for object methods by searching in the search box ClassName::methodName
. For example, try searching SplFixedArray::getSize
.
Next section
The next section will be published again. Will talk about what Zval is, how they work, and how they are used in the source code (all z_* macros).
Understanding the definition of PHP intrinsics (PHP Source for PHP Developers-Part Two)