[Translated from nikic] introduction to native methods in PHP for the first time, I translated others' articles in level 4 English ~~ However, there may be a lot of inappropriate things, even though you may not think of proper translation in some places, I have posted the original text and a very low translation at the same time ). It took me three nights and one noon to translate this article ~, First, let's understand what the technology is talking about, and then translate it ~ The goal of this is to train your English, and to learn more about new foreign technologies. [translated from nikic] Native PHP methods
Introduction
For the first time, translate others' articles and use level 4 English for translation ~~ However, there may be a lot of inappropriate things, even though you may not think of proper translation in some places, I have posted the original text and a very low translation at the same time ).
It took me three nights and one noon to translate this article ~, First, let's understand what the technology is talking about, and then translate it ~
The goal of this is to train your English and to learn new foreign technical ideas.
This article mainly describes how to implement object-oriented operations for native types in PHP through extension, it is used to solve the problems of nonstandard function names, nonstandard parameter order, and low readability in PHP.
The implementation of extension is implemented by modifying the corresponding processing function when the ZEND Engine calls the object-oriented method. Register a function to determine the type of the object-oriented caller, if it is IS_STRING, the custom processing will continue; otherwise, the default ZEND processing function will be returned. let's take a look at the following explanation.
HOOKPHP code
PHP is an interpreted language. the code is translated into an intermediate bytecode and parsed and executed by the ZEND Engine. PHP calls the intermediate bytecode OPCODE. Each OPCODE corresponds to a processing function at the bottom layer of ZEND. the ZEND Engine finally executes this processing function. To implement the HOOK function, you only need to change the handler corresponding to the hook opcode.
Object-Oriented Methods of the original type in PHP
A few days ago, Anthony Ferrara wrote some ideas about the future of PHP. I agree with most of his views, but not all. This article focuses on a special aspect: disguising native types like strings and arrays as "pseudo objects" and performing method calls on these native types.
Let's start with several examples to see why "pseudo object" is required ":
$str = "test foo bar";$str->length(); // == strlen($str) == 12$str->indexOf("foo") // == strpos($str, "foo") == 5$str->split(" ") // == explode(" ", $str) == ["test", "foo", "bar"]$str->slice(4, 3) // == substr($str, 4, 3) == "foo"$array = ["test", "foo", "bar"];$array->length() // == count($array) == 3$array->join(" ") // == implode(" ", $array) == "test foo bar"$array->slice(1, 2) // == array_slice($array, 1, 2) == ["foo", "bar"]$array->flip() // == array_flip($array) == ["test" => 0, "foo" => 1, "bar" => 2]
Here$str
It is just a common string,$array
It's just a normal array-they are not objects. What we do is to give them a little bit of object-like features so that they can call methods.
Note that the above features are not out of reach, but already exist. PHP extension scalar_objects allows you to define object-oriented methods on the native PHP type.
The introduction of object-based calling methods of native types also brings many benefits. I will list them below:
An opportunity to make APIs more concise
The most common complaints about PHP are probably the inconsistency between functions in the standard library, unclear names, and inconsistent and disordered parameters. Some typical examples:
// Different function naming conventions strposstr_replace // The undefined naming conventions strcspnstrpbrk // The inverted parameter order strpos ($ haystack, $ needle) array_search ($ needle, $ haystack)
However, these problems are often over-emphasized (we have an integrated development environment), and it is difficult to deny that the situation is still quite good. It should also be pointed out that many functions present far more problems than the strange name. Usually, the behavior in the edge condition is not fully considered, so there is a need to perform special processing on them in the calling code. (For string functions, the Edge condition usually includes an empty string or an offset out of the string range .)
A general recommendation is to implement a large number of aliases in PHP6 to unify the function names and parameter order. So we will havestring\pos()
,string\reoplace()
,string\complement_span()
Or similar. Personally (and it seems like my opinion on many php-src developers) is of little significance to me. Now these function names are deeply rooted in the memory of PHP programmers. it seems that it is not worthwhile to implement some unimportant decorative changes to them.
The introduction of native object-oriented APIs also provides an opportunity for API re-design (original article: the introduction of an oo api for primitive types on the other hand offers an opportunity of an API redesign as a side effect of switching to a new paradigm ). This is also the expectation output of the old API, which makes us start from scratch. Two examples:
- I really want to have
$string->split($delimiter)
And$array->join($delimiter)
In this way, these are commonly accepted names of function functions (explode
Andimplode
). On the other hand, if there isstring\split($delimiter)
I will be very disgusted with this behavior, because the existingstr_split
What the function does is completely different (grouping ).
- I naturally like the new API that uses exceptions for error reports. because it is an object-oriented API, it is automatically given. exceptions are also used together with renamed APIs, however, this is against the conventional practice of all program functions to warn about error handling. this is not static, but it is indeed a controversial point I want to avoid :)
This is my primary motivation for implementing native object-oriented APIs: from scratch, we can implement a reasonable set of API designs. Of course, I do not know all the benefits of this change. Object-oriented syntax provides many deeper benefits, which will be discussed below.
Improved readability
Program calls are generally not chained. Consider the following example:
$output = array_map(function($value) { return $value * 42;}, array_filter($input, function($value) { return $value > 10;});
Which one isarraay_map
Andarray_filter
Their respective use? (Original article: what are array_map and array_filter applied? ) What is the order of their calls? Variable$input
Hidden between two closures, the writing order of functions is also different from the actual calling order. Now the same example uses the object-oriented syntax:
$output = $input->filter(function($value){return $value > 10;})->map(function($value){return $value * 42;});
I dare say that using this method, the operation order (firstfilter
Inmap
) And initial input array$input
More obvious.
In this example, it is obvious that someone is piecing together, becausearray_map
Andarray_filter
This is another example of reversing the order of function parameters (that is why the input array is in the middle ). Let's look at another example where the input parameter is in the same position (from the actual code ):
substr(strstr(rtrim($className, '-'), '\\', '_'), 15);
In this example, the last part is a series of additional parameters.'_'), '\\', '_'), 15,
, It is difficult to match these parameters with the application functions. Compare this with the version using the object-oriented method:
$className->trimRight('_')->replace('\\', '_')->slice(15);
This function operation is closely related to their parameters, and the call of methods matches their execution sequence.
Another advantage of readability from this syntax is thatneedle/haystack
Not clear. Alias solves this problem by introducing a unified parameter order specification. the problem of using object-oriented APIs basically does not exist.
$string->contains($otherString);$string->contains($someValue);$string->indexOf($otherString);$string->indexOf($someValue);
Here, the confusion about which rule is applied no longer exists.
Polymorphism
Currently, PHP providesContable
Interface, which can implement custom output functions through the classcount($obj)
. Why? Because our PHP functions are not polymorphism. However, polymorphism is required in our method:
If array implementation$array->count()
As a method, the code does not actually care$array
Is it an array? it can be any other type of implementation.count()
Method object, which basically gives usCountable
All actions ,~ (Original article: This basically gives us the same behavior as Countable, just without the engine hackery it requires .)
This is also a common solution. For example, you can implement a method that implements all string types.UnicodeString
Class, then you can use the normal string andUnicodeStrings
. Well, at least this is theory. This is obviously limited to the use of string methods, and errors will be returned when cascade operations are called (original: this wowould obviusly only work as long as the usage is limited to just the string methods, and wowould fail once the concatenation operator is employed) (operator overload currently only supports classes in the kernel ).
I still have a strong belief that this is clear and applied to arrays and so on. By inheriting the same interface, you can have the same behavior as the array.SplFixedArray
. (Original article: you cocould have an SplFixedArray behave the same way as an array, by implementing the same interface .)
Now that we have summarized some of the advantages of this method, let's take a look at its problems:
Loose type
Excerpted from Anthony's blog:
Scalar is not an object, but more importantly, they are not of any type. PHP depends on a type system. the string and number are the same. Many of the flexibility in the system can be easily converted to another scalar based on any scalar.
More importantly, due to the loose type system, you cannot know the type of a variable at any time. You can say what type you want him to be, but you don't know what his internal type is. Even with casting or scalar type hinting it isn' t a perfect situation since there are cases where types can still change.
To clarify this problem, consider the following example:
$num = 123456789;$sumOfDigits = array_sum(str_split($num));
Here$num
As a string numberstr_split
Use after splittingarray_sum
Sum. Now try the object-oriented method call with the same effect:
$num = 123456789;$sumOfDigits = $num->chunk()->sum();
Here the stringcheunk()
The method is called by numbers. What will happen ?? Anthony suggested the following solution:
This means that all scalar operations will need the corresponding scalar type. This will lead to the need for an object model with all scalar mathematical methods, including all string methods. It's a nightmare .....
As mentioned in the introduction, this is by no means an acceptable solution. However, I think we can get away with it and throw an error (exception!) in that case !). To explain why this method is feasible, let's see what types of PHP can have.
Native type in PHP
In addition to objects, PHP has the following variable types:
nullboolintfloatstringarrayresource
Now, let's consider which of the above will require object-oriented methods: we should first removeresource
And then read it in the rest.null
Andbool
Obviously, no object-oriented method is required unless you want$bool->invert()
This boring conversion.
The vast majority of mathematical functions are not very suitable for using object-oriented methods. Consider the following examples:
log($n)$n->log()sqrt($n)$n->sqrt()acosh($n)$n->acosh()
I think you will agree that mathematical functions are more readable than function symbols. Of course, there are a few object-oriented methods that you can apply to the numeric type, such$num->format(10)
Reading is quite good. However, here, an object-oriented digital API is not really needed, and you may need only a few functions. (In addition, the current mathematical API does not have many naming issues, and the naming standards related to mathematical operations are equivalent .)
Now we only have strings and arrays left. we have seen many great APIs for these two types. But what do we have to do about the loose type? The following are important points:
We often think of strings as numbers (for example, from HTTP or DB), which in turn is wrong: it is rare to directly treat numbers as strings. For example, the following code will confuse us:
strpos(54321, 32, 1);
It is a weird operation to regard the number as a string. in this case, you only need to forcibly convert it once. Example of using the original sum number:
$num = 123456789;$sumOfDigits = ((string) $num)->chunk()->sum();
Now we understand that, yes, we really want to treat numbers as strings. It is acceptable for me to use this technology in this way.
Array is simpler: it does not mean an array operation to be considered as another operation that is not an array type.
On the other hand, we can use the scalar type prompts to improve this problem (I fully think that all PHP versions exist-the most embarrassing problem is that there is still no (original article: which I totally assume to be present in any PHP version this gets in-really embarrassing that we still don't have them )). If the internal type promptsstring
, The string you get the input will be a string (even if it is not passed to the function-it depends on the specific content of the type prompt implementation ).
Of course, I am not suggesting that there is no problem here. Due to the wrong function design, unknown types may sometimes occur in the code, suchsubstr($str, strlen($str))
Return results intelligentlybool(false)
Insteadstring(0) ""
. (However, this problem only exists.substr
Yes. The object-oriented API does not have that problem, so you cannot touch it .)
Object transfer semantics
In addition to the type issue, there is also a semantic problem of native pseudo-methods: Objects in PHP have different transfer semantics (to some extent, similar to references) than other types ). If we allow the string and array to call object-oriented methods, they will look very similar to the object, then some people may expect them to have the object as the parameter transfer semantics. This problem exists in both strings and arrays:
Function change ($ arg) {echo $ arg-> length (); // $ arg looks like object $ arg [0] = 'x '; // but no object transfer semantics} $ str = 'foo'; change ($ str); // $ str stays the same $ array = ['foo ', 'o', 'O']; change ($ array); // $ array stays the same
Of course, we will change the transfer semantics. First of all, in my opinion, it is quite low to pass a large data structure like an array through value transfer. I prefer them to pass it like an object. However, it will be a great breakthrough for backward compatibility, and it will not be easy to automatically refactor (original article: However, that wocould be a pretty big backwards-compatibility break and one that's not easy to refactor automatically) (at least I guess so. I did not try to explore the actual impact of such a change ). On the other hand, it will be a disaster for the string to pass parameters through objects, unless we make the string completely unchangeable at the same time, discard the variability of all current local variables (I personally found it very easy-to try to change one byte of a Python string ).
I don't know if there is a good way to solve this expected problem, except that in our document we emphasize that strings and arrays are only treated as "pseudo-pairs" in object-oriented methods. not a real object.
This problem can be extended to other object-related features. For example$string instanceof string
Is this correct. I have not yet determined the overall trend of the entire process. It may be better to strictly adhere to the use of object-oriented methods and emphasize that they are not real objects. However, it may be better to support more in-depth features of object-oriented systems. This point of view should be further considered.
Current status
All in all, this method has many problems, but I don't think they are particularly important. At the same time, this provides a good opportunity to introduce concise and clear APIs for our basic types to improve the readability (writability) of code execution operations ).
So what is the current status of this idea? From the content I collected, people inside are not particularly opposed to this practice, but prefer to rename all functions. The main reason for not promoting this is the API proposal ~
For this purpose, I created the scalar_objects extension, which is implemented as a PHP extension. It allows you to register a class that handles the calls of methods of their native types. Let's look at an example:
class StringHandler {public function length(){return strlen($this);}public function contains($str){return false !== strpos($this, $str);}}register_primitive_type_handler('string', 'StringHandler');$str = 'foo bar baz';var_dump($str->legth());//int(11)var_dump($str->contains('bar'));//bool(true)var_dump($str->contains('hello'));//bool(false)
Not long ago, I started a string handler including an API description, but I never really completed any project (I hope I will find some motivation to start again soon ). Of course, there are also many other projects that are working to implement such an APIs.
Well, this is one of the improvements I want to see in PHP6. I may write another article for my plan in that direction.
Reference
Link: http://nikic.github.io/2014/03/14/Methods-on-primitive-types-in-PHP.html
HOOKPHP: http://netsecurity.51cto.com/art/201407/446430.htm