a python string
In the latest version of Python 3, strings are encoded in Unicode, meaning that Python's strings support multiple languages.
For the encoding of a single character, Python provides an ord() integer representation of the function to get the character, and the chr() function converts the encoding to the corresponding character:
>>> Ord (' A ') 65>>> ord (' Middle ') 20013>>> chr ("The ' B ' >>> chr (25991) ' text ')
If you know the integer encoding of a character, you can also write it in hexadecimal str :
>>> ‘\u4e2d\u6587‘
‘中文‘
Python bytes uses b a prefixed single or double quotation mark for data of type:
Be aware of the distinction ‘ABC‘ and the b‘ABC‘ former is str that although the content is displayed in the same way as the former, bytes each character occupies only one byte.
The str pass method, expressed in Unicode encode() , can be encoded as specified bytes , for example:
>>> ‘ABC‘.encode(‘ascii‘)b‘ABC‘>>> ‘中文‘.encode(‘utf-8‘)b‘\xe4\xb8\xad\xe6\x96\x87‘>>> ‘中文‘.encode(‘ascii‘)Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: ‘ascii‘ codec can‘t encode characters in position 0-1: ordinal not in range(128)
Conversely, if we read the byte stream from the network or disk, then the data read is bytes . To turn bytes str it into, you need to use the decode() method:
>>> b‘ABC‘.decode(‘ascii‘)‘ABC‘>>> b‘\xe4\xb8\xad\xe6\x96\x87‘.decode(‘utf-8‘)‘中文‘
To calculate str how many characters are included, you can use a len() function:
>>> len(‘ABC‘)3>>> len(‘中文‘)2
len()The function calculates the str number of characters, and if bytes so, the len() function calculates the number of bytes:
>>> len(b‘ABC‘)3>>> len(b‘\xe4\xb8\xad\xe6\x96\x87‘)6>>> len(‘中文‘.encode(‘utf-8‘))6
As can be seen, 1 Chinese characters are UTF-8 encoded and typically consume 3 bytes, while 1 English characters take up only 1 bytes.
We often encounter str and convert to and bytes from each other when manipulating strings. In order to avoid garbled problems, we should always adhere to the use of UTF-8 encoding str and bytes conversion.
Because the Python source code is also a text file, so when your source code contains Chinese, it is important to specify that you save it as UTF-8 encoding when you save it. When the Python interpreter reads the source code, in order for it to be read by UTF-8 encoding, we usually write these two lines at the beginning of the file:
#!/usr/bin/env python3# -*- coding: utf-8 -*-
The first line of comments is to tell the Linux/os x system that this is a python executable and the Windows system ignores this comment;
The second line of comments is to tell the Python interpreter to read the source code according to the UTF-8 encoding, otherwise the Chinese output you write in the source code may be garbled.
Affirming that UTF-8 encoding does not mean that your .py file is UTF-8 encoded, you must and make sure that the text editor is using UTF-8 without BOM encoding:
If the .py file itself uses UTF-8 encoding and is also stated # -*- coding: utf-8 -*- , open the command prompt test to display the Chinese as normal:
EG:
The results are as follows:
Formatting
The last common question is how to output a formatted string. We often output strings that are similar, ‘亲爱的xxx你好!你xx月的话费是xx,余额是xx‘ and XXX's content varies by variable, so a simple way to format a string is required.
In Python, the format used is consistent with the C language, and is implemented as an % example:
>>> ‘Hello, %s‘ % ‘world‘‘Hello, world‘>>> ‘Hi, %s, you have $%d.‘ % (‘Michael‘, 1000000)‘Hi, Michael, you have $1000000.‘
As you may have guessed, the % operator is used to format the string. Inside the string, the representation is replaced by a string, which %s %d is replaced with an integer, there are several %? placeholders, followed by a number of variables or values, the order to correspond well. If there is only one %? , the parentheses can be omitted.
Common placeholders are:
| %d |
Integer |
| %f |
Floating point number |
| %s |
String |
| %x |
hexadecimal integer |
where formatted integers and floating-point numbers can also specify whether to complement 0 and the number of digits of integers and decimals:
>>> ‘%2d-%02d‘ % (3, 1)
‘ 3-01‘>>> ‘%.2f‘ % 3.1415926‘3.14‘
If you're not sure what to use, %s it'll always work, and it will convert any data type to a string:
>>> ‘Age: %s. Gender: %s‘ % (25, True)
‘Age: 25. Gender: True‘
Sometimes, % what about a normal character inside a string? This time you need to escape and use it %% to represent one % :
>>> ‘growth rate: %d %%‘ % 7
‘growth rate: 7 %‘
Python Learning Note--2