Analysis of PHP string encoding problem

Source: Internet
Author: User
Tags form post alphanumeric characters
    1. $encoding = mb_detect_encoding ($string, Array ("ASCII", ' utf-8′, "gb2312′," GBK ", ' big5′));
Copy Code

Then: Mb_substr (String $str, int $start [, int $length [, String $encoding]])

If you achieve mb_substr, efficiency is not very good.

encoding-related PHP functions using ord (substr ($str, $i, 1)) > 0xa0)

Ord ($string) returns the ASC code of the first character of the string, which determines whether the first character of the truncated string is a kanji, because for example a gb2312 encoded text is 2 bytes and UTF8 is three bytes. That is, the code greater than 256 is the Chinese character.

Regular characters:

    1. Match kanji: preg_match_all ('/[\x80-\xff]?. /', $string, $match);
    2. Match English: Preg_match_all ("/[/x01-/x7f]+/", $string, $match);
Copy Code

Encoding Conversion

    1. Iconv (String $in _charset, String $out _charset, String $str)
    2. such as GB2312 UTF-8: Iconv ("GB2312", "UTF-8", $text)
Copy Code

URL encoding UrlEncode

The string returned after encoding is in addition to-_. All non-alphanumeric characters are replaced with a percent sign (%) followed by a two-digit hexadecimal number, and a space is encoded as a plus (+). This encoding is the same as the WWW form POST data, and is encoded in the same way as the application/x-www-form-urlencoded media type.

Note: You should encode only part of the URL when encoding, otherwise the colon and backslash in the URL will be escaped.

There are two ways of UrlEncode, one is traditional GB2312 based encode and the other is encode based on UTF-8. For example:

    1. $url = ' China ';
    2. echo UrlEncode ($url);
    3. UTF-8:%E4%B8%AD%E5%9B%BD
    4. Gb2312:%d6%d0%b9%fa
Copy Code

For example, we use the browser to open Baidu, search "China". See in Address bar: http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD&rsv_bp=0&ch=&tn=baidu&bar=&rsv_spt=3 &ie=utf-8&rsv_sug3=16&rsv_sug=0&rsv_sug4=302&rsv_sug1=11&inputt=22928

That is, we see "China" is automatically converted by the browser to:%E4%B8%AD%E5%9B%BD. The difference between UrlEncode and Rawurlencode: UrlEncode encodes a space as a plus "+", and Rawurlencode encodes a space as a plus "%20".

URL decoding urldecode and Rawurldecode1, when decoding, you can use the corresponding UrlDecode () and Rawurldecode (), correspondingly, Rawurldecode () will not decode the plus sign (' + ') to a space , while UrlDecode () can. 2, UrlDecode () and Rawurldecode () decoding the string is UTF-8 format encoding, if the URL contains non-UTF-8 encoded in Chinese, then the decoded string to be converted. as follows, set the PHP file to gb2312 encoding first. You will see part of the garbled, part is normal.

    1. $url = ' China ';
    2. echo $a = UrlDecode (UrlEncode ($url)), ";
    3. echo iconv (' gb2312 ', ' utf-8 ', $a);
    4. ? й? China
Copy Code
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.