Chinese character encoding Research series phase III, PHP function Chapter Master ord () and Chr () function application, in the previous period [PHP Basics of ASCII code comparison and character conversion] in the article to understand the ASCII code and character conversion method, but found in the use of character conversion between two special functions, For converting between characters and decimal, the Ord () function converts a character to a decimal number, and the Chr () function transforms the decimal number into a character, which acts as a bridge between binary, octal, decimal, and hexadecimal.
the application of Ord () function
The Ord () function returns the ASCII value of a character, the most basic use of which is to get the ASCII value of a ord (' a ') to return 97, but in actual development, the most applied is the decimal number in the character intercept function to get the high and low bit encoding of the Chinese characters, As a common Chinese character intercept function, you can look at the Substrs () function or the CUTSTR () function in the source code of the Phpwind or discuz! forum, using the Ord () function to get the ASCII value of the character, and if the return value is greater than 127, it will be half the Chinese character, Then get the second half of the combination into a complete character, while combining character encoding such as GBK or UTF-8.
Taking GBK encoding as an example, the Ord () function is used to determine the ASCII value of the Chinese character returned by the literal character, as follows
Copy Code code as follows:
$string = "Don't be infatuated with brother";
$length = strlen ($string);
Var_dump ($string);//Original Chinese
Var_dump ($length);//Length
$result = Array ();
for ($i =0; $i < $length; $i + +) {
if (Ord ($string [$i]) >127) {
$result [] = $string [$i]. ' '. $string [+ + $i];
}
}
Var_dump ($result);
Code description
1, defines a variable $string, whose value is a string
2, get the length of the variable (number of bytes)
3, print the length of variables and variables
4, through the for loop to get the variables of each byte value, a Chinese character in the middle of two bytes separated by a space display.
The result is the following figure
graphic: "Do not crush brother" for 5 Chinese characters, a total of 10 bytes (2 bytes of a Chinese character), printing each byte can not be normal display as above
Original value unchanged modify for loop part of code displays individual byte ASCII values
Copy Code code as follows:
$result = Array ();
for ($i =0; $i < $length; $i + +) {
if (Ord ($string [$i]) >127) {
$result [] = Ord ($string [$i]). ' '. Ord ($string [+ + $i]);
}
}
Var_dump ($result);
As the code above uses the Ord () function to print the ASCII values of each character, the result is as follows
The ASCII values of individual characters can be viewed normally after conversion through the Ord () function.
two, the application of Chr () function
The Chr () function, in contrast to the Ord () function, is used to return the specified character, such as Chr (97) to return a.
Combined with the above example, as long as the ASCII value of the Chinese characters, you can use the Chr () function to assemble the Chinese characters, the code is as follows
Copy Code code as follows:
$string = "Don't be infatuated with brother";
$length = strlen ($string);
Var_dump ($string);//Original Chinese
Var_dump ($length);//Length
$result = Array ();
for ($i =0; $i < $length; $i + +) {
if (Ord ($string [$i]) >127) {
$result [] = Ord ($string [$i]). ' '. Ord ($string [+ + $i]);
}
}
Var_dump ($result);
foreach ($result as $v) {
$decs = Explode ("", $v);
Echo chr ($decs [0]). chr ($decs [1]);
}
The result is the following figure
As the code does not directly output Chinese characters, but print out the normal Chinese characters, the principle is to get the ASCII value of each byte, through the Chr () function into bytes, and then two bytes together to form a complete Chinese characters.
Based on the discussion of Ord () and Chr () functions, the encoding principle of Chinese characters has been preliminarily understood, to understand the two bytes of a Chinese character in GBK encoding, use the Ord () and Chr () functions to implement each byte conversion method, please pay attention to the next issue of the text character encoding research series of character encoding conversion principle.
Resources
Comparison of Substrs and cutstr performance between Phpwind and discuz intercept character functions