Python character encoding file method

Source: Internet
Author: User
This article mainly analyzes the python character encoding file methods in depth and has some reference value. interested partners can refer to the development of character encoding.

ASCII 255 only supports English letters and numbers, with special characters 1 bytes

Unicode: 2 bytes in Chinese and English

UTF-8 Chinese> 3 bytes English> 1 bytes

Bytes type

Text data is always represented by unicode by str type, while binary data is represented by bytes type.

Binary data is used for video and audio files, and data transmission through socket networks.

String to binary str. encode ("encoding = utf-8 ")

Convert binary to string B '\ xe2 \ x82'. decode ("encoding = utf-8 ")

The file handle is the memory address of the file object.

Character encoding and transcoding

The ASCII code table cannot contain Chinese characters. the default system character encoding for windows is GBK.

Unicode character encoding can store all the characters in the world, but all characters occupy two bytes.

Unicode files occupy 4 MB of storage space.

UTF-8 string to gbk characters

Any two encoding strings must be converted to Unicode encoding first.

Unicode (unified code, universal code, single code) is a character encoding used on a computer. Unicode is generated to address the limitations of traditional character encoding schemes. it sets a uniform and unique binary encoding for each character in each language.

The occurrence of garbled characters is basically in two situations:

1. no Character encoding

2. character encoding conflicts. When someone writes this program, the specified character set and the character set we use are not in the correct position.

In Python 2. x, when Pyton explains the. py file, it gives him an ASCII code by default.

In Python3, Unicode encoding is used by default.

Because in python2.X, the default is ASCII encoding, you specify the encoding in the file as a UTF-8, but if you want to convert the UTF-8 GBK is not directly transferred, the Unicode needs to be a transfer site.

Character string feature. Once modified, re-create

The above is a detailed description of the python character encoding file method. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.