Problems caused by a GBK Encoding

Source: Internet
Author: User

Yesterday, I heard from the town side about the architecture in the store and mentioned an interesting question. Since Chinese versions all adopt GBK encoding, some incorrect matches may occur when searching (using strstr to search for substrings, to add some tags to the string, the front-end is garbled.

The GBK encoding of "xia Xin" is "0xcf 0xc4 0xd0 0xc2", and the GBK encoding of "male" is "0xc4 0xd0", which exactly matches the two middle bytes, if there are still Chinese characters behind "xia Xin", these words will be tragic. If you use UTF-8 encoding, there will be no problem, because Chinese uses UTF-8 encoding requires three bytes (1110 XXXX 10 xxxxxx 10 xxxxxx), and the first byte will be 'E ', the next two bytes start at 10, and the maximum value is 'B', so that there will be no error matching problem.

In the early days, due to bandwidth and storage device restrictions, many websites were using GBK encoding, so they had no choice. Linux for UTF-8 encoding support is very good, now every day to switch the encoding, in the screen encountered garbled real trouble, the window directly stuck, sometimes a period of time can be restored, sometimes the window can only be killed. Hadoop for GBK encoding support is not good, write Mr job often need to encode first into a UTF-8, when the task is running, and then into GBK encoding output.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.