To support multiple languages, strings in the database may be saved as UTF-8 encoding, and part of the string may need to be captured using php during website development. To avoid garbled characters, write the following UTF-8 string truncation function
Procedure 1: PHP method for intercepting Chinese Strings
Today, we have found a better way to intercept Chinese strings.
Function msubstr ($ str, $ start, $ len ){
$ Tmpstr = "";
$ Strlen = $ start + $ len;
For ($ I = 0; $ I <$ strlen; $ I ++ ){
If (ord (substr ($ str, $ I, 1)> 0xa0 ){
$ Tmpstr. = substr ($ str, $ I, 2 );
$ I ++;
} Else
$ Tmpstr. = substr ($ str, $ I, 1 );
}
Return $ tmpstr;
}
Program 2: PHP intercepts the UTF-8 string to solve the problem of half character
/*************************************** ***************************
* PHP intercepts the UTF-8 string to solve the half character problem.
* English letters, numbers (halfwidth) are 1 byte (8 bits), and Chinese (fullwidth) are 3 bytes.
* @ Return refers to the retrieved string. when $ len is less than or equal to 0, the entire string is returned.
* @ Param $ str Source string
* $ Length of the substring on the left of len
**************************************** ************************/
Function utf_substr ($ str, $ len)
{
For ($ I = 0; $ I <$ len; $ I ++)
{
$ Temp_str = substr ($ str, 0, 1 );
If (ord ($ temp_str)> 127)
{
$ I ++;
If ($ I <$ len)
{
$ New_str [] = substr ($ str, 0, 3 );
$ Str = substr ($ str, 3 );
}
}
Else
{
$ New_str [] = substr ($ str, 0, 1 );
$ Str = substr ($ str, 1 );
}
}
Return join ($ new_str );
}
?>
Php UTF-8 string truncation
Function cutstr ($ string, $ length ){
Preg_match_all ("/[\ x01-\ x7f] | [\ xc2-\ xdf] [\ x80-\ xbf] | \ xe0 [\ xa0-\ xbf] [\ x80- \ xbf] | [\ xe1-\ xef] [\ x80-\ xbf] [\ x80-\ xbf] | \ xf0 [\ x90-\ xbf] [\ x80 -\ xbf] [\ x80-\ xbf] | [\ xf1-\ xf7] [\ x80-\ xbf] [\ x80-\ xbf] [\ x80-\ xbf]/", $ string, $ info );
For ($ I = 0; $ I $ wordscut. = $ info [0] [$ I];
$ J = ord ($ info [0] [$ I]) & gt; 127? $ J + 2: $ j + 1;
If ($ j> $ length-3 ){
Return $ wordscut ."...";
}
}
Return join (", $ info [0]);
}
$ String = "242432 objection: 456 is equivalent to 7890 in a wide range of embassy places ″;
For ($ I = 0; $ I {
Echo cutstr ($ string, $ I )."
";
}
?>
Truncates UTF-8 string functions.
To support multiple languages, strings in the database may be saved as UTF-8 encoding, and part of the string may need to be captured using php during website development. To avoid garbled characters, write the following UTF-8 string truncation function
For the principle of UTF-8, please refer to the UTF-8 FAQ
The characters encoded by the UTF-8 may be 1 ~ It consists of three bytes. the specific number can be determined by the first byte. (Theoretically it may be longer, but it is assumed that the length cannot exceed 3 bytes)
The first byte is greater than 224, which together with the second byte after it forms a UTF-8 character
The first byte is greater than 192 less than 224, and it is a UTF-8 character with the first byte after it
Otherwise, the first byte is an English character (including numbers and a small part of punctuation marks ).
It is also a function used to extract the length of the home page.
// $ Sourcestr is the string to be processed
// $ Cutlength is the truncation length (that is, the number of words)
Function cut_str ($ sourcestr, $ cutlength)
{
$ Returnstr = '';
$ I = 0;
$ N = 0;
$ Str_length = strlen ($ sourcestr); // Number of bytes of the string
While ($ n <$ cutlength) and ($ I <= $ str_length ))
{
$ Temp_str = substr ($ sourcestr, $ I, 1 );
$ Ascnum = Ord ($ temp_str); // Obtain the ascii code of the $ I character in the string
If ($ ascnum> = 224) // if the ASCII bit height is 224,
{
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 3); // count three consecutive characters as a single character according to the UTF-8 encoding specification
$ I = $ I + 3; // The actual Byte count is 3.
$ N ++; // string length meter 1
}
Elseif ($ ascnum> = 192) // if the ASCII bit height is 192,
{
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 2); // Count 2 consecutive characters as a single character according to the UTF-8 encoding specification
$ I = $ I + 2; // The actual Byte count is 2.
$ N ++; // string length meter 1
}
Elseif ($ ascnum >=65 & $ ascnum <= 90) // if it is a capital letter,
{
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 1 );
$ I = $ I + 1; // The actual number of bytes is still counted as 1
$ N ++; // consider the overall appearance. uppercase letters are counted as a high character.
}
Else // In other cases, including lower-case letters and halfwidth punctuation marks,
{
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 1 );
$ I = $ I + 1; // The actual number of bytes is 1.
$ N = $ n + 0.5; // lower-case letters and halfwidth punctuation and half-height character width...
}
}
If ($ str_length> $ cutlength ){
$ Returnstr = $ returnstr. "..."; // when the length is exceeded, add a ellipsis at the end.
}
Return $ returnstr;
}