Strings and encodings

Source: Internet
Author: User
Tags coding standards

Chinese QuestionsHow to write a Chinese program:
    • Use Chinese to declare the encoding in the first line #encoding =utf-8
    • The file is saved in utf-8 encoded format , the Chinese string in the file needs U, for example: U ' Chinese '
    • Because Python only checks #, coding, and encoded strings, you may see the following declaration, which some people write for reasons such as aesthetics: #-*-coding:utf-8-*-

Common coding:
    • GB2312 encoding: For the exchange of information between Chinese character processing and Chinese character communication systems
    • GBK encoding: Is one of the Chinese character coding standards, is an internal code extension specification based on the GB2312-80 standard, using a double-byte encoding
    • ANSI is related to the language of the Windows operating system you are using, the Simplified Chinese version of Windows 7 is GBK (in English with one byte, two bytes for one Chinese)
    • ASCII encoding: A uniform provision for the relationship between English characters and binary
    • Unicode encoding: This is an encoding of all the characters in the world, but it does not have a defined storage method.
    • UTF-8 encoding: is the abbreviation for Unicode transformation Format-8 bit, and UTF-8 is a way to implement Unicode. It is a variable-length encoding that can use 1~4 bytes to represent a character, varying the length of a byte depending on the symbol.

two types of string types
    • BYTE string: A byte string is a list of contained bytes. Type: str
A byte string is one that contains a list of bytes <type ' str ' > <type ' Unicode ' >. When needed, Python converts bytes into characters based on the default locale settings of the computer. The default encoding on Mac Ox is UTF-8, but on other systems, most of it is ASCII. #创建一个字节字符串byteString = "Hello world!"code example:
#-*-Coding:utf-8-*-
s = "Hello normal string"
Print U "byte string", type (s)
U = S.decode ("UTF-8")
Print U "Unicode string", type (U)
Backtobytes = U.encode ("UTF-8")
Print U "byte string", type (backtobytes)
More explanations:
Now, the byte string s is treated as a UTF-8 byte list to create a Unicode string u, and the next line is converted to a byte string backtobytes with the string U UTF-8.

    • Unicode string, type: Unicode
#创建一个Unicode字符串:
unicodestring = u "Hello Unicode world!"

Encoding Conversion
strings inside Python are generally unicode-encoded. The default encoding of the string in the code is consistent with the encoding of the code file itself. So to do some encoding conversion is usually done in Unicode as an intermediate encoding , that is, the other encoded string decoding (decode) into Unicode, and then from the Unicode encoding (encode) into another encoding.
The role of Ødecode is to convert other encoded strings to Unicode encoding, for example: Name.decode ("GB2312"), to convert the GB2312 encoded string name to Unicode encoding
The role of Øencode is to convert Unicode encoding into other encoded strings, for example: Name.encode ("GB2312"), which translates the Unicode string name to GB2312 encoding
Ø The code conversion must first know that the name is the type of encoding, and then decode into Unicode encoding, and finally encode into the encoding required
Øname is already Unicode encoded, then you do not need to do decode decoding conversion, directly with encode can be encoded into the code you need
Ø Decode of Unicode characters such as Chinese will be an error
 
encoding of the file:
Ø In the UTF-8 file, the string is UTF-8 encoded, and its encoding depends on the current text encoding
the encoding of øgb2312 text is GB2312
Ø In the same text, the output of two encodings must be encoded conversion, first use decode to convert the original text encoding to Unicode, and then use encode to convert the code into the encoding needed to convert
Example:
#-*-coding:utf-8-*-
FP1 = open (' D:\\testfile.txt ', ' R ') #手工创建文件为ANSI编码保存 (GBK)
info1 = Fp1.read ()
# known as GBK encoding, decoded into Unicode,
tmp = Info1.decode (' GBK ')
 
FP2 = open (' D:\\testfile.txt ', ' W ')
# Encoded into UTF-8-encoded STR
Info2 = Tmp.encode (' UTF-8 ')
fp2.write (Info2) #写入utf8字符 and save
fp2.close () #文件会变为utf-8 encoding Save

Strings and encodings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.