In Python, the relationship between ASCII,UNICODE,UTF8,GBK is combed

Source: Internet
Author: User

In the computer, often encountered coding problems, this section mainly combs the ascii,unicode,utf8,gbk of the relationship between the various encodings.

Ascii

Computer, all data is represented by 0 and each. In the beginning, the content to be expressed is less, and people use ASCII encoding to encode it.

ASCII (American Standard Code for Information interchange, United States Standards Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be used up to 8 Bit to represent (one byte), that is: 2**8-1 = 255, so the ASCII code can only represent a maximum of 255 symbols.

1 1 1 1 1 1 1 1 =2**0+2**1+2**2+2**3+2**4+2**5+2**6+2**7 = 2**8-1=255

Unicode,utf-8,gbk

With the development of computers, it is obvious that ASCII code cannot represent all kinds of words and symbols in the world, so we need a new encoding that can represent all the characters and symbols, namely: Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language to meet the requirements of cross-language, cross-platform text conversion and processing. Unicode specifies that all characters and symbols are represented with a minimum of 2 bytes (16 bits), i.e. 2**16-1=65535

UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but instead classifies all characters and symbols: the contents of the ASCII code are saved with 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes ...

GBK, also based on the further optimization of Unicode encoding, GBK's literal encoding is expressed in double-byte notation, that is, both Chinese and English characters are represented by double-byte

The relationship between Unicode and UTF-8,GBK,

Python environment

In Python2, when the Python interpreter loads the code in the. py file, the content is encoded (default Ascill)

Therefore, if you have Chinese in the file, the ASCII code will not be represented. Therefore, in a. py file, you should explicitly tell the Python interpreter what code to use to execute the source code, namely:

#!/usr/bin/env python#-*-coding:utf-8-*-print "Hello, World"

In Python3, the Python interpreter, which encodes the content by default in Unicode, does not need to specify an encoding format to represent Chinese.

In Python, the relationship between ASCII,UNICODE,UTF8,GBK is combed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.