Python's Character Set summary

Source: Internet
Author: User

By the character set for a long time, the concept and understanding of encoding and character set is still in school to learn the 1:30 tune, today a bit of time to study hard.

1. Default encoding method

The first problem is the default character set, which is got spit, there is no.

Input to the Ipython client and results

In [6]: Unicode_str=u'China'In [7]: unicode_strout[7]: U'\U4E2D\U56FD'In [8]: Default_str ='China'In [9]: default_strout[9]:'\XE4\XB8\XAD\XE5\X9B\XBD'In [Ten]: Unicode_str.encode ('Utf-8') out[10]:'\XE4\XB8\XAD\XE5\X9B\XBD'

Why is the default utf-8? Because my system default is Lang=zh_cn.utf-8?? < pending confirmation >

In the Py file, enter

# !/usr/bin/env Python2 if __name__ " __main__ " :     = u'       China ' ' China  '

Error:syntaxerror:non-ascii character ' \xe4 ' in the file ***.py on line 4, but no encoding declared;

This problem is because "in Python2 's py write Chinese, need to add a line to declare the file encoding comments, otherwise it will be ASCII". This doesn't have to be a lot

U'\u4e2d\u56fd'\xe4\xb8\xad\xe5\x9b\xbd'

So with the character set, the code looks like this:

1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6     Print '%r'%Unicode_str7     Print '%r'% DEFAULT_STR

Not in error, the result is normal display:

U ' \U4E2D\U56FD '
' \XE4\XB8\XAD\XE5\X9B\XBD '

The code line sixth outputs the Unicode character set encoding result, and the seventh line outputs the encoding of the utf-8 character set. Because you've selected Utf-8 as the default encoding in a Python program by default, and if you switch to GBK, it's completely different encoding

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6     Print '%r'% Unicode_str.encode ('GBK')7     Print '%r'%Unicode_str8     Print '%r'% DEFAULT_STR

The results are shown as:

' \xd6\xd0\xb9\xfa '
U ' \U4E2D\U56FD '
' \xd6\xd0\xb9\xfa '

Obviously, the default Chinese character set is replaced by the GBK

============================ the dividing line ========================== of basic knowledge

By extension, what if I read or write to a document from a database, file, or other place?

CASE1:

Write a line in Chinese to the file to see how the file is encoded:

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

Use the instruction "File-i test.txt" to see how the file is encoded: "Why is the ISO format?" Tomorrow's add-on

Test.txt:text/plain; Charset=iso-8859-1

CASE2:

Change the character set back to Utf-8 to see how the file is encoded:

1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

View files with ' File-i test.txt ':

Test.txt:text/plain; Charset=utf-8

CASE3:

Read a file that is encoded differently than Python specifies (the Test.txt file is utf-8 encoded, and Python specifies GBK):

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'R') as F:7Lines =F.readlines ()8      forLineinchlines:9         Print '%r'% LineTen         PrintLine

The results shown are:

' \xe4\xb8\xad\xe5\x9b\xbd\n '
China

Replacement, Python is specified in the following way: Utf-8,test.txt: GBK

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

The results shown are:

' \xd6\xd0\xb9\xfa\n '
? й?

Fact two points:

1. When writing and reading, Python and the operating system completely retain the original encoding and decoding style, without any conversion

2. When doing print, it is a systematic explanation. "This is tomorrow's study."

Originally wanted to do a summary of the character set, but found that even a default character set problem, have been engaged in 1 hours ~~~~~

Let's finish tomorrow.

Python's Character Set summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.