The following applies to Python version 2.7.
1. About ASCII Unicode Utf-8:
ascii:127 number A--Z--1221 bytes represents one character (255)
Unicode: typically two bytes (65535) represent a character, and an uncommon character may represent a character in four bytes. The ASCII binary representation in Unicode is preceded by 0. (The encoding used by the computer is Unicode)
If a document is full of English characters, the use of Unicode can result in a waste of memory, resulting in utf-8 (variable length encoding).
ASCII consumes one byte of memory in Utf-8, while Chinese occupies three bytes.
2. Unicode encoding is used in the computer, and when it comes to hard disk storage, Unicode is converted to utf-8 storage. For example: Notepad for storage.
Similarly, on a Web page, the Unicode data is converted to utf-8 displayed on the browser.
ASCII-to-digital conversion in 3.Python
>>> Ord ('A')+ >>> chr A'
4.Python added support for Unicode, which is represented by a string in Unicode u‘...‘
Print u' Chinese ' >>> u ' u ' ' \u4e2d '
5. u‘xxx‘
Convert to UTF-8 encoding ‘xxx‘
encode(‘utf-8‘)
method: (Unicode to Utf-8)
>>> u " abc " . Encode ( " utf-8 Span style= "COLOR: #800000" > " ) " Span style= "COLOR: #800000" >abc " >>> u " Chinese ". Encode (" Utf-8 " ) '
6. In turn, convert the string represented by the UTF-8 encoding ‘xxx‘
to a Unicode string u‘xxx‘
using the decode(‘utf-8‘)
method: (Utf-8 converted to Unicode )
>>>'ABC'. Decode ('Utf-8') U'ABC'>>>'\xe4\xb8\xad\xe6\x96\x87'. Decode ('Utf-8') U'\u4e2d\u6587'>>>Print '\xe4\xb8\xad\xe6\x96\x87'. Decode ('Utf-8') Chinese
7.Python file header Specifies the encoding format to read (in case the file contains Chinese, and the Chinese string must be a Unicode string)
# !/usr/bin/env python # -*-coding:utf-8-*-
Python Coding issues