PHP string Encoding

Source: Internet
Author: User
Tags form post alphanumeric characters

As we all know, different characters are encoded in different memory bytes. For example, ASCII characters occupy 1 byte, UTF-8-Encoded chinese characters are 3 bytes, GBK is 2 bytes.

PHP also comes with several string truncation functions, including substr and mb_substr.

Garbled characters are generated when substr is used to intercept Chinese characters, because substr is captured by byte. That is, UTF-8 encoding of the Chinese, use substr interception, will only intercept 1/3 Chinese, of course, garbled.

Mb_substr(String$str, Int$start[, Int$length[, String$encoding]) Parameter $ encoding can specify the encoding. If it is omitted, internal character encoding is used.

If you do not know the encoding format of the string, you can use mb_detect_encoding to check:

$ Encoding = mb_detect_encoding ($ string, array ("ASCII", 'utf-8', "gb2312'," GBK ", 'big5 ′));

Then:

Mb_substr(String$str, Int$start[, Int$length[, String$encoding])

If you implement mb_substr by yourself, the efficiency is not very good.

Encoding-related php Functions

Ord (substr ($ str, $ I, 1)> 0xa0)

Ord ($ string) returns the ASC code of the first character of the string. It is used to determine whether the first character of the string to be intercepted is a Chinese character. For example, a gb2312 encoded text is 2 bytes, utf8 is three bytes. That is, if the encoding is greater than 256, It is a Chinese character.

Regular character:

Match Chinese characters: preg_match_all ('/[\ x80-\ xff]?. /', $ String, $ match );

Match English: preg_match_all ("/[/x01-/x7f] +/", $ string, $ match );


Encoding conversion

Iconv (string$in_charset, String$out_charset, String$str)

For example, GB2312 to UTF-8: iconv ("GB2312", "UTF-8", $ text) Url-encoded urlencodeExcept -_.All other non-alphanumeric characters will be replaced with a semicolon ( %) Followed by two hexadecimal numbers, and space is encoded as the plus sign ( +). This encoding method is the same as that for WWW form POST data. Application/x-www-form-urlencodedThe media type encoding method is the same.
However, it should be noted that only part of the URL should be encoded. Otherwise, the colon and backslash in the URL will be escaped.

URLEncode generally has two ways, one is the traditional Based on GB2312 Encode, the other is based on UTF-8 Encode. For example: Copy codeThe Code is as follows: $ url = 'China ';
Echo urlencode ($ url );
// UTF-8: % E4 % B8 % AD % E5 % 9B % BD
// GB2312: % D6 % D0 % B9 % FA

For example, we use a browser to open Baidu, search for "China". In the address bar to see: http://www.baidu.com/s? Wd = % E4 % B8 % AD % E5 % 9B % BD & rsv_bp = 0 & ch = & tn = baidu & bar = & rsv_spt = 3 & ie = UTF-8 & rsv_sug3 = 16 & rsv_sug = 0 & rsv_sug4 = 302 & rsv_sug1 = 11 & inputT = 22928 then we can see that "China" is automatically converted: % E4 % B8 % AD % E5 % 9B % BD.
The difference between urlencode and rawurlencode: urlencode encodes the space into the plus sign "+", and rawurlencode encodes the space into the plus sign "% 20 ".

Url Decoding: urldecode and rawurldecode1. During decoding, you can use the corresponding urldecode () and rawurldecode (). Correspondingly, rawurldecode () does not decode the plus sign ('+') as a space, while urldecode () yes. 2. the string decoded by urldecode () and rawurldecode () is encoded in UTF-8 format. If the URL contains a Chinese character encoded in a non-UTF-8, the decoded string must be converted. Set the PHP file to gb2312 encoding as follows. You will see that some of them are garbled and some are normal. $ Url = 'China ';
Echo $ a = urldecode (urlencode ($ url )),'';
Echo iconv ('gb2312', 'utf-8', $ );
�� China

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.