This paper mainly discusses the binary security of PHP from three angles: 1. What is php binary security; 2. What structure ensures the binary security of PHP; 3. What other aspects of this structure are used?
Do know it, but also know the reason why.
Sentence explanation:
PHP's internal functions are guaranteed to achieve the desired results when manipulating binary data, such as Str_replace, Stristr,strcmp , and so on, we say that these functions are binary safe.
As an example:
Let's compare the strcmp functions under C and PHP.
C code is as follows
Main () { char ab[] = "aa\0b"; Char ac[] = "aa\0c"; printf ("%d\n", strcmp (AB, AC)); printf ("%d\n", strlen (AB)); }
Results:
0
2
Interpretation:
In other words, the C language thinks that the two strings of AB and AC are equal, and AB is 2 in length.
The PHP code is as follows
<?php $ab = "aa\0b"; $ac = "aa\0c"; Var_dump (strcmp ($ab, $ac)); Var_dump (strlen ($ab));?>
Results:
Int (-1)
Int (4)
Interpretation:
That is, the PHP language thinks that the two strings of AB and AC are equal, and the length of AB is 4.
Smart you, you should have found out where the problem is, good, for the C language ' Terminator ' is the string of the word, so in C for the string "aa\0b", it reads ' "\ S" will default character read has ended, and throw away the following string ' B ', causing us to see strlen ("aa\ 0b ") has a value of 2
That's the problem again, PHP is C to develop, why did PHP do the binary security?
Let's take a look at PHP's variable storage zval structure
PHP determines which member of the value is accessed based on the value of type, as a string, we access the STR structure identified by the red box, which is the storage structure of the underlying string, which has two values, one pointer to the string Val, and the other is the Len value that records the length of the string. It is because of the value of Len that PHP is binary safe : because it does not need to be the same as C by whether the entire string is read or not, it is read by the length specified by Len.
We can see a small data structure improvement, which brings us more space for imagination, which can be described as
A small step in the structure, a big step in function。
Expansion:
The use of such a structure, obviously will be used everywhere, our common redis, in the underlying storage of data used in this structure, Redis does not directly use the traditional C-language string representation (the character array ending with a null character), but instead built a type called simple dynamic string Dynamic String,sds), and use SDS as the default string representation of Redis
Look at the structure definition of SDS
struct SDSHDR { //record the number of bytes used in the BUF array //equals the length of the string saved by SDS int len; Record the number of unused bytes in the BUF array int free; A byte array used to hold the string char buf[];};
As you can see, we see the familiar Len value again and it ensures that the Redis storage is binary secure.
The following is sufficient to illustrate this point:(Excerpt fromHttp://redisbook.com/preview/sds/different_between_sds_and_c_string.html#id6)
The characters in the C string must conform to some encoding (such as ASCII), and in addition to the end of the string, the string cannot contain null characters, otherwise the null character that is first read by the program is mistaken for the end of the string-these restrictions allow the C string to hold only text data, not images, audio, Binary data such as video and compressed files.
For example, if you have a special data format that uses empty characters to divide multiple words, 2-17, then this format cannot be saved using the C string, because the function used by the C string will only recognize it and "Redis"
ignore it "Cluster"
.
Although the database is typically used to hold text data, there are many scenarios where the database is used to hold the binary data, so to ensure that Redis can be used in a variety of different scenarios, the SDS API is binary safe (Binary-safe): all SDS APIs Will process the data stored in the SDS in a binary way, and the buf
program will not make any restrictions, filters, or assumptions about the data in it-what it is when it is written and what it reads.
This is why we refer to the SDS buf
attribute as a byte array--redis not use this array to hold characters, but instead use it to hold a series of binary data.
For example, there is no problem with using SDS to save the special data format mentioned previously, because SDS uses len
the value of the attribute instead of the null character to determine whether the string ends, as shown in 2-18.
By using binary secure SDS instead of the C string, Redis can save not only text data, but also binary data in any format.
Binary Security for PHP