There are many useful operations in the Python programming language that can help us easily implement functions in certain environments. For example, you can operate on Chinese characters. Today, let's take a look at the application skills related to Chinese Characters in Python.
Python-related operation code example:
- #! /Usr/bin/python
- #-*-Coding: UTF-8 -*-
- S = "China"
- Ss = u "China"
- Print s, type (s), len (s)
- Print ss, type (ss), len (ss)
- Print '-' * 40
- Print repr (s)
- Print repr (ss)
- Print '-' * 40
- Ss1 = s. decode ('utf-8 ')
- Print s1, len (s1), type (s1)
- Print '-' * 40
- Ss2 = s. decode ('utf-8'). encode ('gbk ')
- Print s2
- Print type (s2)
- Print len (s2)
- Print '-' * 40
- S3 = ss. encode ('gbk ')
- Print s3
- Print type (s3)
- Print len (s3)
The execution result is as follows:
- China <type 'str'> 6
- China <type 'unicode '> 2
- ----------------------------------------
- '\ Xe4 \ xb8 \ xad \ xe5 \ x9b \ xbd'
- U' \ u4e2d \ u56fd'
- ----------------------------------------
- China 2 <type 'unicode '>
- ----------------------------------------
- ��
- <Type 'str'>
- 4
- ----------------------------------------
- ��
- <Type 'str'>
- 4
Supplement:
View the default encoding settings for Chinese Characters in Python:
- >>> import sys
- >>> sys.getdefaultencoding()
- 'ascii'
Because #-*-coding: UTF-8-*-is specified on the file header, the encoding of s is UTF-8.
- Python file path
- Python exception handling mechanism
- Analysis of basic Python String applications
- Summary of Python Process Control keywords
- Introduction to two common methods for connecting Python to a database
In UTF-8, English letters are one byte, and Chinese characters are three bytes;
The Chinese Character in unicode is 1 character dubyte );
The Chinese Character in GBK encoding occupies 2 bytes. (Thanks to keakon for correcting)
The preceding section describes the Chinese characters of Python.