Understanding the definition of PHP internal functions (PHP source code for PHP developers-2)

Source: Internet
Author: User
Understanding the definition of PHP internal functions (for PHP developers PHP source code-2) Original article: https://nikic.github.io/2012/03/16/Understanding-PHPs-internal-function-definitions.html

Welcome to the second part of the "PHP source code for PHP developers" series.

In the previous article, ircmaxell explains where you can find the PHP source code, its basic directory structure, and briefly introduced some C languages (because PHP is written in C ). If you miss the article, you may need to read it before you start reading it.

In this article, we are talking about locating the definition of PHP internal functions and understanding their principles.

How to find the function definition

In the beginning, let's try to find out the strpos function definition.

Step 1: go to the PHP 5.4 Root Directory and enter strpos in the search box at the top of the page. The search result is a large list that shows the position of strpos in the PHP source code.

Because this result is not very helpful to us, we use a small trick: we search for "PHP_FUNCTION strpos" (do not miss double quotation marks, they are very important), rather than strpos.

Now we get two entry links:

/PHP_5_4/ext/standard/    php_string.h 48   PHP_FUNCTION(strpos);    string.c     1789 PHP_FUNCTION(strpos)

The first thing to note is that both locations are in the ext/standard folder. This is what we want to find, because the strpos function (like most string, array, and file functions) is part of the standard extension.

Now, open two links on the new tab and see what code is hidden behind them.

You will see that the first link takes you to the php_string.h file, which contains the following code:

// ...PHP_FUNCTION(strpos);PHP_FUNCTION(stripos);PHP_FUNCTION(strrpos);PHP_FUNCTION(strripos);PHP_FUNCTION(strrchr);PHP_FUNCTION(substr);// ...

This is a typical header file (a file ending with the. h suffix): a simple function list, which is defined elsewhere. In fact, we are not interested in this because we already know what we are looking.

The second link is more interesting: it takes us to the string. c file, which contains the real source code of the function.

Before I take you through this function step by step, I recommend that you try to understand it yourself. This is a very simple function. although you don't know the real details, most code looks very clear.

Skeleton of PHP functions

All PHP functions use the same basic structure. The variables are defined at the top of the function, and then the zend_parse_parameters function is called. then the main logic involves the call of RETURN _ *** and php_error_docref.

Let's start with the function definition:

zval *needle;char *haystack;char *found = NULL;char  needle_char[2];long  offset = 0;int   haystack_len;

The first line defines a pointer to zval needle. Zval represents the definition of any PHP variable in PHP. What is it like? I will focus on it in the next article.

The second row defines the pointer to a single character. At this time, you need to remember that in the C language, arrays represent pointers pointing to their first element. For example, the haystack variable points to the first character of the $ haystack string variable you passed. Haystack + 1 points to the second character, haystack + 2 points to the third character, and so on. Therefore, you can read the entire string by incrementing the pointer one by one.

The problem is that PHP needs to know where the string ends. Otherwise, it will increment the pointer rather than stop. To solve this problem, PHP also saves a clear length, which is the haystack_len variable.

Now, in the above definition, we are interested in the offset variable, which is used to save the third parameter of the function: start the search offset. It is defined using long. like int, it is also an integer data type. The difference between the two is not important, but you need to know that in PHP, integer values are stored using long, and string lengths are stored using int.

Now let's take a look at the following three lines:

if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sz|l", &haystack, &haystack_len, &needle, &offset) == FAILURE) {    return;}

The three lines of code are used to obtain the parameters passed to the function and store them in the variables declared above.

The first parameter passed to the function is the number of parameters passed. This number is provided through the ZEND_NUM_ARGS () macro.

The next function is the TSRMLS_CC Macro, which is a feature of PHP. You will find this strange macro scattered in many places in the PHP code library. Is part of thread security Resource Manager (TSRM), which ensures that PHP will not confuse variables between multiple threads. This is not very important for us. when you see TSRMLS_CC (or TSRMLS_DC) in the code, ignore it. (Note that there is no comma before "argument. This is because whether or not you use the thread-safe function to create a function, the macro will be interpreted as null or trsm_ls. Therefore, comma is a part of the macro .)

Now we come to the important thing: "sz L "string indicates the parameters received by the function. :

S // The first parameter is the string z // The second parameter is a zval struct, any variable | // identifies the following parameter as optional l // The third parameter is the long type (integer)

In addition to s, z, and l, there are more identification types, but most of them can be clearly defined in characters. For example, B is boolean, d is double (floating-point number), a is array, f is callback (function), and o is object.

The following parameters & haystack, & haystack_len, & needle, & offset specify the variable of the parameter to be assigned a value. As you can see, they are all transmitted using references (&), meaning that they are not passing the variables themselves, but pointing to their pointers.

After this function is called, haystack will contain the haystack string. haystack_len is the length of the string, needle is the value of needle, and offset is the starting offset.

In addition, this function uses FAILURE (when you try to pass invalid parameters to the function, for example, passing an array value to a string) to check. In this case, the zend_parse_parameters function will throw a warning, and this function will return immediately (null will be returned to the PHP user-Layer Code ).

After parameter resolution, the main function body starts:

if (offset < 0 || offset > haystack_len) {    php_error_docref(NULL TSRMLS_CC, E_WARNING, "Offset not contained in string");    RETURN_FALSE;}

What this code does is obvious. if the offset exceeds the boundary, an E_WARNING-level error will be thrown through the php_error_docref function, and then the function uses the RETURN_FALSE macro to return false.

Php_error_docref is an error function. you can find it in the extended Directory (for example, ext folder ). Its name is defined based on the Reference Document returned on the error page (that is, functions that do not work normally. There is also a zend_error function, which is mainly used by Zend Engine, but often appears in the extension code.

Both functions use the sprintf function, such as formatting information. Therefore, the error message can contain placeholders that will be filled by subsequent parameters. The following is an example:

php_error_docref(NULL TSRMLS_CC, E_WARNING, "Failed to write %d bytes to %s", Z_STRLEN_PP(tmp), filename);// %d is filled with Z_STRLEN_PP(tmp)// %s is filled with filename

Let's continue to parse the code:

if (Z_TYPE_P(needle) == IS_STRING) {    if (!Z_STRLEN_P(needle)) {        php_error_docref(NULL TSRMLS_CC, E_WARNING, "Empty delimiter");        RETURN_FALSE;    }    found = php_memnstr(haystack + offset,                        Z_STRVAL_P(needle),                        Z_STRLEN_P(needle),                        haystack + haystack_len);}

The first five lines are very clear: this branch will only be executed when needle is a string and will throw an error if it is null. Then it comes to an interesting part: php_memnstr is called, and this function is mainly used. As usual, you can click the function name and view its source code.

Php_memnstr returns the pointer to the first position of needle in haystack (that is why the found variable should be defined as char *, for example, the pointer to a character ). Here, we can know that offset can be calculated by subtraction, which can be seen at the end of the function:

RETURN_LONG(found - haystack);

Finally, let's take a look at the branches when needle is used as a non-string:

else {    if (php_needle_char(needle, needle_char TSRMLS_CC) != SUCCESS) {        RETURN_FALSE;    }    needle_char[1] = 0;    found = php_memnstr(haystack + offset,                        needle_char,                        1,                        haystack + haystack_len);}

I only reference "if needle is not a string, it will be converted to an integer and considered as a string value ." In addition to strpos ($ str, 'A'), you can also write strpos ($ str, 65) because the character is encoded as 65.

If you look at the variable definition, you can see that needle_char is defined as char needle_char [2], that is, there are two character strings, php_needle_char will be the real character (here is 'A ') to needle_char [0]. Then, the strpos function sets needle_char [1] to 0. The reason is that in C, the string ends with '\ 0', that is, the last character is set to NUL (the character encoded as 0 ). In the PHP syntax environment, this condition does not exist because PHP stores the length of all strings (so it does not need 0 to help find the end of the string ), however, to ensure compatibility with C functions, it is implemented in PHP.

Zend functions

I am tired of the strpos function. let's find another function: strlen. We use the following method:

Search for strlen from the PHP5.4 source code root directory.

You will see the use of a bunch of irrelevant functions, so search for "PHP_FUNCTION strlen ". When you search this way, you will find something strange: no results.

The reason is that strlen is a minority of functions defined through Zend Engine instead of PHP extension. In this case, the function is not defined using PHP_FUNCTION (strlen), but ZEND_FUNCTION (strlen ). Therefore, we also need to search for "ZEND_FUNCTION strlen ".

We all know that we need to click the link without a semicolon to jump to the definition of the source code. This link takes us to the following function definition:

ZEND_FUNCTION(strlen){    char *s1;    int s1_len;    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &s1, &s1_len) == FAILURE) {        return;    }    RETVAL_LONG(s1_len);}

This function is too simple to implement. I don't think I need further explanation.

Method

We will talk about how classes and objects work in more details in other articles, but as a little spoiler: you can search for object methods by searching ClassName: methodName in the search box. For example, try to search for SplFixedArray: getSize.

Next part

The next part will be published again. It will talk about what zval is, how they work, and how they are used in the source code (all Z _ *** macros ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.