A detailed description of the encoding problem for PHP strings _php instances

Source: Internet
Author: User
Tags form post alphanumeric characters
As you know, different character encodings are not the same number of bytes consumed in memory. Such as The ASCII encoded character occupies 1 bytes, the UTF-8 encoded Chinese characters are 3 bytes, and the GBK is 2 bytes.

PHP also comes with several string interception functions, which are commonly used in substr and MB_SUBSTR.

It is garbled to use substr to intercept Chinese characters, because substr is intercepted by byte. That is, UTF-8 encoded in Chinese, using substr interception, will only intercept 1/3 Chinese, of course, there is garbled.

Mb_substr ( string $str , int $start [, int $length [, string $encoding ]]), the parameter $encoding can specify the encoding, and if omitted, the internal character encoding is used.

If you do not know the encoding format of the string, you can check with mb_detect_encoding:

$encoding = mb_detect_encoding ($string, Array ("ASCII", ' utf-8′, "gb2312′," GBK ", ' big5′));

And then:

Mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )

If you achieve mb_substr, efficiency is not very good.

Encoding-related PHP functions are used

Ord (substr ($str, $i, 1)) > 0xa0)

Ord ($string) returns the ASC code of the first character of the string, which determines whether the first character of the truncated string is a kanji, because for example a gb2312 encoded text is 2 bytes and UTF8 is three bytes. That is  , the code greater than 256 is the Chinese character.


Regular characters:

Match kanji: preg_match_all ('/[\x80-\xff]?. /', $string, $match);

Match English: Preg_match_all ("/[/x01-/x7f]+/", $string, $match);


Encoding Conversion

Iconv ( string $in_charset , string $out_charset , string $str )

such as GB2312 UTF-8: Iconv ("GB2312", "UTF-8", $text)
URL encoding urlencode
the string returned after encoding is in addition to the -_. all non-alphanumeric characters will be replaced with a percent ( % ) followed by a two-digit hexadecimal number, the space is encoded as a plus sign ( + ). The encoding is the same as the WWW form POST data, and application/x-www-form-urlencoded The media type is encoded the same way.

It should be noted, however, that only part of the URL should be encoded at the time of encoding, otherwise the colon and backslash in the URL will be escaped.

There are two ways of UrlEncode, one is traditional GB2312 based encode and the other is encode based on UTF-8. such as: Copy CodeThe code is as follows:
$url = ' China ';
echo UrlEncode ($url);
UTF-8:%E4%B8%AD%E5%9B%BD
Gb2312:%d6%d0%b9%fa

For example, we use the browser to open Baidu, search "China". See in Address bar: http://www.baidu.com/s?wd= %E4%B8%AD%E5%9B%BD&rsv_bp=0&ch=&tn=baidu&bar=&rsv_spt=3&ie=utf-8&rsv_sug3=16&rsv_sug=0&rsv_ sug4=302&rsv_sug1=11&inputt=22928

That is, we see that "China" is automatically converted by the browser to: %E4%B8%AD%E5%9B%BD.

the difference between UrlEncode and Rawurlencode: UrlEncode encodes a space as a plus "+", and Rawurlencode encodes a space as a plus "%20".

URL decoding urldecode and Rawurldecode 1, in decoding, you can use the corresponding UrlDecode () and Rawurldecode (), accordingly, Rawurldecode () will not decode the plus (' + ') to a space, and UrlDecode () can. 2, UrlDecode () and Rawurldecode () decoding the string is UTF-8 format encoding, if the URL contains non-UTF-8 encoded in Chinese, then the decoded string to be converted. as follows, set the PHP file to gb2312 encoding first. You will see part of the garbled, part is normal. $url = ' China ';
echo $a = UrlDecode (UrlEncode ($url)), ";
echo iconv (' gb2312 ', ' utf-8 ', $a);
й China

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.