- $encoding = mb_detect_encoding ($string, Array ("ASCII", ' utf-8′, "gb2312′," GBK ", ' big5′));
Copy CodeThen: Mb_substr (String $str, int $start [, int $length [, String $encoding]]) If you achieve mb_substr, efficiency is not very good. encoding-related PHP functions using ord (substr ($str, $i, 1)) > 0xa0) Ord ($string) returns the ASC code of the first character of the string, which determines whether the first character of the truncated string is a kanji, because for example a gb2312 encoded text is 2 bytes and UTF8 is three bytes. That is, the code greater than 256 is the Chinese character. Regular characters:
- Match kanji: preg_match_all ('/[\x80-\xff]?. /', $string, $match);
- Match English: Preg_match_all ("/[/x01-/x7f]+/", $string, $match);
Copy CodeEncoding Conversion
- Iconv (String $in _charset, String $out _charset, String $str)
- such as GB2312 UTF-8: Iconv ("GB2312", "UTF-8", $text)
Copy CodeURL encoding UrlEncode The string returned after encoding is in addition to-_. All non-alphanumeric characters are replaced with a percent sign (%) followed by a two-digit hexadecimal number, and a space is encoded as a plus (+). This encoding is the same as the WWW form POST data, and is encoded in the same way as the application/x-www-form-urlencoded media type. Note: You should encode only part of the URL when encoding, otherwise the colon and backslash in the URL will be escaped. There are two ways of UrlEncode, one is traditional GB2312 based encode and the other is encode based on UTF-8. For example:
- $url = ' China ';
- echo UrlEncode ($url);
- UTF-8:%E4%B8%AD%E5%9B%BD
- Gb2312:%d6%d0%b9%fa
Copy CodeFor example, we use the browser to open Baidu, search "China". See in Address bar: http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD&rsv_bp=0&ch=&tn=baidu&bar=&rsv_spt=3 &ie=utf-8&rsv_sug3=16&rsv_sug=0&rsv_sug4=302&rsv_sug1=11&inputt=22928 That is, we see "China" is automatically converted by the browser to:%E4%B8%AD%E5%9B%BD. The difference between UrlEncode and Rawurlencode: UrlEncode encodes a space as a plus "+", and Rawurlencode encodes a space as a plus "%20". URL decoding urldecode and Rawurldecode1, when decoding, you can use the corresponding UrlDecode () and Rawurldecode (), correspondingly, Rawurldecode () will not decode the plus sign (' + ') to a space , while UrlDecode () can. 2, UrlDecode () and Rawurldecode () decoding the string is UTF-8 format encoding, if the URL contains non-UTF-8 encoded in Chinese, then the decoded string to be converted. as follows, set the PHP file to gb2312 encoding first. You will see part of the garbled, part is normal.
- $url = ' China ';
- echo $a = UrlDecode (UrlEncode ($url)), ";
- echo iconv (' gb2312 ', ' utf-8 ', $a);
- ? й? China
Copy Code |