By the character set for a long time, the concept and understanding of encoding and character set is still in school to learn the 1:30 tune, today a bit of time to study hard.
1. Default encoding method
The first problem is the default character set, which is got spit, there is no.
Input to the Ipython client and results
In [6]: Unicode_str=u'China'In [7]: unicode_strout[7]: U'\U4E2D\U56FD'In [8]: Default_str ='China'In [9]: default_strout[9]:'\XE4\XB8\XAD\XE5\X9B\XBD'In [Ten]: Unicode_str.encode ('Utf-8') out[10]:'\XE4\XB8\XAD\XE5\X9B\XBD'
Why is the default utf-8? Because my system default is Lang=zh_cn.utf-8?? < pending confirmation >
In the Py file, enter
# !/usr/bin/env Python2 if __name__ " __main__ " : = u' China ' ' China '
Error:syntaxerror:non-ascii character ' \xe4 ' in the file ***.py on line 4, but no encoding declared;
This problem is because "in Python2 's py write Chinese, need to add a line to declare the file encoding comments, otherwise it will be ASCII". This doesn't have to be a lot
U'\u4e2d\u56fd'\xe4\xb8\xad\xe5\x9b\xbd'
So with the character set, the code looks like this:
1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6 Print '%r'%Unicode_str7 Print '%r'% DEFAULT_STR
Not in error, the result is normal display:
U ' \U4E2D\U56FD '
' \XE4\XB8\XAD\XE5\X9B\XBD '
The code line sixth outputs the Unicode character set encoding result, and the seventh line outputs the encoding of the utf-8 character set. Because you've selected Utf-8 as the default encoding in a Python program by default, and if you switch to GBK, it's completely different encoding
1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6 Print '%r'% Unicode_str.encode ('GBK')7 Print '%r'%Unicode_str8 Print '%r'% DEFAULT_STR
The results are shown as:
' \xd6\xd0\xb9\xfa '
U ' \U4E2D\U56FD '
' \xd6\xd0\xb9\xfa '
Obviously, the default Chinese character set is replaced by the GBK
============================ the dividing line ========================== of basic knowledge
By extension, what if I read or write to a document from a database, file, or other place?
CASE1:
Write a line in Chinese to the file to see how the file is encoded:
1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)
Use the instruction "File-i test.txt" to see how the file is encoded: "Why is the ISO format?" Tomorrow's add-on
Test.txt:text/plain; Charset=iso-8859-1
CASE2:
Change the character set back to Utf-8 to see how the file is encoded:
1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3 fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)
View files with ' File-i test.txt ':
Test.txt:text/plain; Charset=utf-8
CASE3:
Read a file that is encoded differently than Python specifies (the Test.txt file is utf-8 encoded, and Python specifies GBK):
1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'R') as F:7Lines =F.readlines ()8 forLineinchlines:9 Print '%r'% LineTen PrintLine
The results shown are:
' \xe4\xb8\xad\xe5\x9b\xbd\n '
China
Replacement, Python is specified in the following way: Utf-8,test.txt: GBK
1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)
The results shown are:
' \xd6\xd0\xb9\xfa\n '
? й?
Fact two points:
1. When writing and reading, Python and the operating system completely retain the original encoding and decoding style, without any conversion
2. When doing print, it is a systematic explanation. "This is tomorrow's study."
Originally wanted to do a summary of the character set, but found that even a default character set problem, have been engaged in 1 hours ~~~~~
Let's finish tomorrow.
Python's Character Set summary