Python's Character Set summary

Last Update:2015-07-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

By the character set for a long time, the concept and understanding of encoding and character set is still in school to learn the 1:30 tune, today a bit of time to study hard.

1. Default encoding method

The first problem is the default character set, which is got spit, there is no.

Input to the Ipython client and results

In [6]: Unicode_str=u'China'In [7]: unicode_strout[7]: U'\U4E2D\U56FD'In [8]: Default_str ='China'In [9]: default_strout[9]:'\XE4\XB8\XAD\XE5\X9B\XBD'In [Ten]: Unicode_str.encode ('Utf-8') out[10]:'\XE4\XB8\XAD\XE5\X9B\XBD'

Why is the default utf-8? Because my system default is Lang=zh_cn.utf-8?? < pending confirmation >

In the Py file, enter

# !/usr/bin/env Python2 if __name__ " __main__ " :     = u'       China ' ' China  '

Error:syntaxerror:non-ascii character ' \xe4 ' in the file ***.py on line 4, but no encoding declared;

This problem is because "in Python2 's py write Chinese, need to add a line to declare the file encoding comments, otherwise it will be ASCII". This doesn't have to be a lot

U'\u4e2d\u56fd'\xe4\xb8\xad\xe5\x9b\xbd'

So with the character set, the code looks like this:

1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6     Print '%r'%Unicode_str7     Print '%r'% DEFAULT_STR

Not in error, the result is normal display:

U ' \U4E2D\U56FD '
' \XE4\XB8\XAD\XE5\X9B\XBD '

The code line sixth outputs the Unicode character set encoding result, and the seventh line outputs the encoding of the utf-8 character set. Because you've selected Utf-8 as the default encoding in a Python program by default, and if you switch to GBK, it's completely different encoding

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3 if __name__=="__main__":4UNICODE_STR = u'China'5Default_str ='China'6     Print '%r'% Unicode_str.encode ('GBK')7     Print '%r'%Unicode_str8     Print '%r'% DEFAULT_STR

The results are shown as:

' \xd6\xd0\xb9\xfa '
U ' \U4E2D\U56FD '
' \xd6\xd0\xb9\xfa '

Obviously, the default Chinese character set is replaced by the GBK

============================ the dividing line ========================== of basic knowledge

By extension, what if I read or write to a document from a database, file, or other place?

CASE1:

Write a line in Chinese to the file to see how the file is encoded:

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

Use the instruction "File-i test.txt" to see how the file is encoded: "Why is the ISO format?" Tomorrow's add-on

Test.txt:text/plain; Charset=iso-8859-1

CASE2:

Change the character set back to Utf-8 to see how the file is encoded:

1 #!/usr/bin/env Python22 #-*-coding:utf-8-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

View files with ' File-i test.txt ':

Test.txt:text/plain; Charset=utf-8

CASE3:

Read a file that is encoded differently than Python specifies (the Test.txt file is utf-8 encoded, and Python specifies GBK):

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'R') as F:7Lines =F.readlines ()8      forLineinchlines:9         Print '%r'% LineTen         PrintLine

The results shown are:

' \xe4\xb8\xad\xe5\x9b\xbd\n '
China

Replacement, Python is specified in the following way: Utf-8,test.txt: GBK

1 #!/usr/bin/env Python22 #-*-CODING:GBK-*-3  fromOs.pathImportExpanduser4 if __name__=="__main__":5Default_str ='China'6With Open (Expanduser ('~/test.txt'),'W') as F:7F.write ('%s\n'%DEFAULT_STR)

The results shown are:

' \xd6\xd0\xb9\xfa\n '
? й?

Fact two points:

1. When writing and reading, Python and the operating system completely retain the original encoding and decoding style, without any conversion

2. When doing print, it is a systematic explanation. "This is tomorrow's study."

Originally wanted to do a summary of the character set, but found that even a default character set problem, have been engaged in 1 hours ~~~~~

Let's finish tomorrow.

Python's Character Set summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python's Character Set summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python's Character Set summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support