PHP string Encoding

Last Update:2018-12-08 Source: Internet

Author: User

Tags form post alphanumeric characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we all know, different characters are encoded in different memory bytes. For example, ASCII characters occupy 1 byte, UTF-8-Encoded chinese characters are 3 bytes, GBK is 2 bytes.

PHP also comes with several string truncation functions, including substr and mb_substr.

Garbled characters are generated when substr is used to intercept Chinese characters, because substr is captured by byte. That is, UTF-8 encoding of the Chinese, use substr interception, will only intercept 1/3 Chinese, of course, garbled.

Mb_substr(String$str, Int$start[, Int$length[, String$encoding]) Parameter $ encoding can specify the encoding. If it is omitted, internal character encoding is used.

If you do not know the encoding format of the string, you can use mb_detect_encoding to check:

$ Encoding = mb_detect_encoding ($ string, array ("ASCII", 'utf-8', "gb2312'," GBK ", 'big5 ′));

Then:

Mb_substr(String$str, Int$start[, Int$length[, String$encoding])

If you implement mb_substr by yourself, the efficiency is not very good.

Encoding-related php Functions

Ord (substr ($ str, $ I, 1)> 0xa0)

Ord ($ string) returns the ASC code of the first character of the string. It is used to determine whether the first character of the string to be intercepted is a Chinese character. For example, a gb2312 encoded text is 2 bytes, utf8 is three bytes. That is, if the encoding is greater than 256, It is a Chinese character.

Regular character:

Match Chinese characters: preg_match_all ('/[\ x80-\ xff]?. /', $ String, $ match );

Match English: preg_match_all ("/[/x01-/x7f] +/", $ string, $ match );

Encoding conversion

Iconv (string$in_charset, String$out_charset, String$str)

For example, GB2312 to UTF-8: iconv ("GB2312", "UTF-8", $ text) Url-encoded urlencodeExcept -_.All other non-alphanumeric characters will be replaced with a semicolon ( %) Followed by two hexadecimal numbers, and space is encoded as the plus sign ( +). This encoding method is the same as that for WWW form POST data. Application/x-www-form-urlencodedThe media type encoding method is the same.
However, it should be noted that only part of the URL should be encoded. Otherwise, the colon and backslash in the URL will be escaped.

URLEncode generally has two ways, one is the traditional Based on GB2312 Encode, the other is based on UTF-8 Encode. For example: Copy codeThe Code is as follows: $ url = 'China ';
Echo urlencode ($ url );
// UTF-8: % E4 % B8 % AD % E5 % 9B % BD
// GB2312: % D6 % D0 % B9 % FA

For example, we use a browser to open Baidu, search for "China". In the address bar to see: http://www.baidu.com/s? Wd = % E4 % B8 % AD % E5 % 9B % BD & rsv_bp = 0 & ch = & tn = baidu & bar = & rsv_spt = 3 & ie = UTF-8 & rsv_sug3 = 16 & rsv_sug = 0 & rsv_sug4 = 302 & rsv_sug1 = 11 & inputT = 22928 then we can see that "China" is automatically converted: % E4 % B8 % AD % E5 % 9B % BD.
The difference between urlencode and rawurlencode: urlencode encodes the space into the plus sign "+", and rawurlencode encodes the space into the plus sign "% 20 ".

Url Decoding: urldecode and rawurldecode1. During decoding, you can use the corresponding urldecode () and rawurldecode (). Correspondingly, rawurldecode () does not decode the plus sign ('+') as a space, while urldecode () yes. 2. the string decoded by urldecode () and rawurldecode () is encoded in UTF-8 format. If the URL contains a Chinese character encoded in a non-UTF-8, the decoded string must be converted. Set the PHP file to gb2312 encoding as follows. You will see that some of them are garbled and some are normal. $ Url = 'China ';
Echo $ a = urldecode (urlencode ($ url )),'';
Echo iconv ('gb2312', 'utf-8', $ );
�� China

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More