Character encoding-your understanding

Source: Internet
Author: User

Character encoding is often annoying. To avoid forgetting, I will record some things I understand here.

Source code Encoding:

SourceCodeEncoding is determined during editing. Generally, you can set it in the editor. For example, notepad ++ can easily set the source code encoding.

Whether it is a dynamic language or a compiled static language, we needProgramTo read our source code. Dynamic languages use interpreters for reading, while compiled languages use compilers. Various "programs that read source code" have default input encoding. The default source code of the python interpreter in 2.x is asicii encoding, and that in 3.x is utf8 encoding. G ++ uses utf8 without bom by default. Msvc determines whether it is utf8 or GBK Based on the BOM. So if your source code is inconsistent with the default encoding of the interpreter/compiler, you should mark it in the source code to let the interpreter/compiler know your encoding method. The annotation methods for various "Interpreters/compilers" are inconsistent. For example, add # To the Python source code #
-*-Coding: UTF-8 -*-. This mark should be consistent with your source code and should not be randomly written.

Runtime encoding:

This problem only exists in compiled languages, such as C/C ++. When you input source code to the compiler, the compiler can also correctly judge the source code encoding. However, the compiler will process the strings in the source code. For example, in msvc, if your source code is utf8 with Bom, msvc judges that your source code is utf8 Based on BOM, so it can understand the source code correctly. However, it converts the string to GBK encoding and stores it in the Data constant area. When the program runs, it reads the runtime encoding, that is, GBK encoding.

// Source code: utf8

Cout <"I am a Chinese" <Endl;

Although "I am a Chinese" is provided to the compiler in utf8 format, it should be understood as cout <"I am a Chinese"... here we enter the GBK encoding "I am Chinese ".

This problem does not occur for G ++ or mingw. The source code and runtime encoding are in the utf8 format.

Encoding used inside the library

When using a variety of libraries, the library generally uses a character encoding for processing. For example, python uses Unicode internally. QT is also. Therefore, for functions that accept parameters as strings, there is a conversion problem. For example, functions such as string. Split () and print ("Hello World") generally accept parameters according to the internal encoding of the database. Some functions that return strings are also internally encoded strings. Therefore, various conversions are essential.

For example, the msvc runtime encoding is GBK, while the QT internal requires Unicode, so no matter what format your source code is, qstring: fromlocal8bit ("I am Chinese ") is required.

Str. Decode () and str. encode () in Python are converted between Unicode and other encodings (such as GBK.

To sum up, character encoding is really annoying. The best method should be the source code. The code is run, and the library kernel uses utf8 in a unified manner.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.