Php provides a large number of string operation functions, such as string length or string truncation functions. However, they can only calculate English characters and cannot perform operations on Chinese strings, the following is a summary of how to intercept string lengths and calculate string lengths. Php provides a large number of string operation functions, such as string length or string truncation functions. However, they can only calculate English characters and cannot perform operations on Chinese strings, the following is a summary of how to intercept string lengths and calculate string lengths.
Script ec (2); script
Common Character Processing functions
The Code is as follows: |
|
Strstr (string, string) = strchr (,) // truncate from the place where a string appears for the first time to the end Strrchr (string, string) // captures a string from its last position to its end. Strpos (string, string [, int]) // position where a string appears for the first time Strrpos (string, string) // position of the last occurrence of a string Substr (string, int [, int]) // truncates a string from a specified position. You can specify the length of the string. Strlen (string) // obtain the length of the string |
Hypothesis
$ Str = "this is a string ";
This string contains a half-width character and is also executed:
The Code is as follows: |
|
If (strlen ($ str)> 10) $ str = substr ($ str, 10 );
|
The 10th and 11 characters of the original string $ str constitute the Chinese character "character ";
After the string is split, the Chinese character is split into two parts, so that the intercepted string will find garbled characters.
Then we can calculate the string length first.
The Code is as follows: |
|
Header ('content-type: text/html; charset = UTF-8 '); $ Str = "sdfsdfcxvzv in Shifu "; $ Pa = '/[x {4e00}-x {9fa5}]/siu '; Preg_match_all ($ pa, $ str, $ r ); $ Count = count ($ r [0]); Echo "the current string contains $ count Chinese characters "; If ($ count> 10) { // If the number of Chinese characters is greater than 10, your code } ?> |
Supplement
PHP calculates the length of a string, including how PHP calculates the length of a string under the English, GBK, and UTF-8 character sets. English String Length
Strlen () is a function provided by PHP to calculate English strings.
GBK String Length
The number of Chinese characters is calculated as 2 characters, and the number of English characters is calculated as 1. You can count the functions of Chinese character strings. Function abslength ($ str ){
The Code is as follows: |
|
$ Len = strlen ($ str ); $ I = 0; While ($ I <$ len) { If (preg_match ("/^ [". chr (0xa1). "-". chr (0xff). "] + $/", $ str [$ I]) { $ I + = 2; } Else { $ I + = 1; } } Return $ I; }
|
UTF8 String Length
The strlen_utf8 function defined below can count the length of the UTF-8 string, but the difference is that this function does not consider bytes, which is somewhat similar
In Javascript, the length of a string is calculated based on one character.
Byte Scheme)
The Code is as follows: |
|
Function strlen_utf8 ($ str ){ $ I = 0; $ Count = 0; $ Len = strlen ($ str ); While ($ I <$ len ){ $ Chr = ord ($ str [$ I]); $ Count ++; $ I ++; If ($ I >=$ len) break; If ($ chr & 0x80 ){ $ Chr <= 1; While ($ chr & 0x80 ){ $ I ++; $ Chr <= 1; } } } Return $ count; } $ Str = "www.111cn.net-PHP information "; Echo strlen_utf8 ($ str ); ?> |
In this way, you can accurately intercept and calculate your Chinese and English fonts. For example:
The Code is as follows: |
|
Supports gb2312, gbk, UTF-8, and big5 Chinese truncation Methods /* * Chinese truncation. Supports gb2312, gbk, UTF-8, and big5 * * @ Param string $ str string to be intercepted * @ Param int $ start position * @ Param int $ length truncation length * @ Param string $ charset UTF-8 | gb2312 | gbk | big5 Encoding * @ Param $ whether to add a suffix to suffix */ Public function csubstr ($ str, $ start = 0, $ length, $ charset = "UTF-8", $ suffix = true) { If (function_exists ("mb_substr ")) { If (mb_strlen ($ str, $ charset) <= $ length) return $ str; $ Slice = mb_substr ($ str, $ start, $ length, $ charset ); } Else { $ Re ['utf-8'] = "/[x01-x7f] | [xc2-xdf] [x80-xbf] | [xe0-xef] [x80-xbf] {2} | [xf0-xff] [X80-xbf] {3 }/"; $ Re ['gb2312'] = "/[x01-x7f] | [xb0-xf7] [xa0-xfe]/"; $ Re ['gbk'] = "/[x01-x7f] | [x81-xfe] [x40-xfe]/"; $ Re ['big5'] = "/[x01-x7f] | [x81-xfe] ([x40-x7e] | xa1-xfe])/"; Preg_match_all ($ re [$ charset], $ str, $ match ); If (count ($ match [0]) <= $ length) return $ str; $ Slice = join ("", array_slice ($ match [0], $ start, $ length )); } If ($ suffix) return $ slice ."... "; Return $ slice; } |