Character encoding Development
ASCII 255 only supports English letters and numbers, special characters 1bytes
Unicode Chinese and English unified 2bytes
Utf-8 Chinese >3bytes English >1bytes
Bytes Type
Text data is always Unicode represented by the STR type, while binary data is represented by the bytes type
Binary data is used in video, audio files, and sending socket networks to transmit data etc.
string into binary Str.encode ("Encoding=utf-8")
Binary into string B ' \xe2\x82 '. Decode ("Encoding=utf-8")
The file handle is the memory address of the file object
Character encoding and transcoding
The ASCII code table cannot be saved in Chinese, and the default system character encoding for Windows is GBK.
Unicode character encoding can store all the characters in the world, but all characters occupy two bytes, the original 2M
English files require 4M of storage space after using Unicode
Utf-8 string converted to GBK character
Any two encoded strings must be converted by first converting to Unicode encoding to implement the
-
unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer . Span style= "Font-family:verdana, Arial, Helvetica, Sans-serif;font-size:14px;line-height:21px;background-color:rgb (255,255,255); " > Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language.
-
garbled appearance basically two kinds of situation:
1, character encoding does not have
2, character encoding conflict, when people write this program, the specified character set and the character set we use are not in the right place
-
pyton when explaining the. py file , the default is to give him a code is ascii code
Unicode encoding is the default in Python3
Because in Python2. The default is ASCII encoding in x, you specify the encoding in the file as UTF-8, but UTF-8 if you want to turn GBK words can not directly turn, need Unicode to do a forwarding site.
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/831021/201610/831021-20161028123857562-45085081. PNG "style=" margin:0px;padding:0px;border:0px; "/>
str = "Hello"//This string is encoded with Utf-8
New_str=str.decode (' utf-8 ')//pass STR original encoding format to decode function transcode to Unicode encoding
ret = New_str.encode (' GBK ')//convert Unicode encoding to GBK encoded string
str= u "Hello" add a U letter in front of the string to indicate that this string is set to Unicode encoding
Coding
The python interpreter encodes the content when it loads the code in the. py file (default Ascill) So if you don't specify the encoding type,
If you have Chinese, you will get an error.
Python's working process
Python read the code into memory 2, lexical parsing 3, put into the compiler---"Generate byte code 4, execute bytecode---" Generate Machine code CPU execution
Variable
The value of the string can not be modified, he is in memory is continuous, if you want to change the words must be reserved in the back so do not support the modification!
String attributes, once modified, re-created
This article from "Number One" blog, declined reprint!
Python character encoding file