String manipulation in Python and encoding Unicode details _

Source: Internet
Author: User
Tags ord
In fact, the string is also a data type, but the string is special is also a coding problem. The following article mainly for you to introduce the Python string operation and encoding Unicode details of the relevant information, the need for friends can refer to the following to see together.

This article mainly introduces some knowledge about the string manipulation and encoding Unicode in Python, so I don't want to say a few words, so we need to learn from them.

String type

str: A Unicode string. Strings constructed with ' or R ' are all str, and single quotes can be substituted with double or triple quotes. In either case, there is no difference in how the python is stored inside.

bytes: binary string. Because files in other formats such as JPG cannot be displayed with str, Bytes is used to indicate that each byte of the bytes is a 0-255 digit. If you print, Python will display portions of ASCII as ASCII, so it's easy to read. Bytes almost supports all methods of STR except formatting, even including the RE module

bytearray() : A string in which binary can be changed in situ.

UTF-8 Encoding Range

Range Number of bytes Storage format
0x0000~0x007f (0 ~ 127) 1 bytes 0xxxxxxx
0X0080~0X07FF (128 ~ 2047) 2 bytes 110xxxxx 10xxxxxx
0X0800~FFFF (2048 ~ 65535) 3 bytes 1110xxxx 10xxxxxx 10xxxxxx
0X10000~1FFFFFF (65536 ~ 2097152) 4 bytes 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0x2000000~0x3ffffff 5 bytes 111110XX 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0X4000000~0X7FFFFFFF) 6 bytes 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

BYTE order Mark BOM

The BOM is the abbreviation for byte order marker,

Specify rules for encoding when writing

Python does not write to the BOM header when writing to a file using the ' utf-8 ' encoding, but specifying the encoded ' Utf-8-sig ' will force Python to write to a BOM header.

Using ' utf-16-be ' will not write a BOM header, but ' utf-16 ' will write to a BOM header.

>>> open (' H.txt ', ' W ', encoding= ' Utf-8-sig '). Write (' AAA ') 3>>> open (' h.txt ', ' RB '). Read () b ' \xef\ Xbb\xbfaaa ' >>> open (' H.txt ', ' W ', encoding= ' utf-16 '). Write (' BBB ') 3>>> open (' h.txt ', ' RB '). Read () B ' \xff\xfeb\x00b\x00b\x00 ' >>> open (' Hh.txt ', ' W ', encoding= ' utf-16-be '). Write (' CCC ') 3>>> open (' Hh.txt ', ' RB '). Read () b ' \x00c\x00c\x00c ' >>> open (' H.txt ', ' W ', encoding= ' utf-8 '). Write (' ddd ') 3>>> Open (' H.txt ', ' RB '). Read () b ' DDD '

Rules for Reading

If the correct encoding is specified, the BOM is ignored or the BOM is displayed as garbled or returns an exception.

>>> open (' H.txt ', ' R '). Read () ' Nobelium 縟 dd ' >>> open (' H.txt ', ' R ', encoding= ' Utf-8-sig '). Read () ' DDD '

Encoding and decoding

    • Chr and Ord

>>> Ord (' Middle ') #20013 >>> chr (20013) # ' Medium '

    • Hard-coded Unicode into the string.

' \xhh ': use 2-bit hexadecimal to represent one character

' \uhhhh ': use 4-bit hexadecimal to represent one character:

' \uhhhhhhhh ': Use 8-bit hexadecimal to represent one character

>>> s = 'py\x74h\u4e2don' #'pyth中on'

Str and bytes, ByteArray for conversion

str.encode(encoding='utf-8')

bytes(s,encoding='utf-8')

bytes.decode(encoding='utf-8')

str(B, encoding='utf-8')

bytearray(string, encoding='utf-8')

bytearray(bytes)

Document Encoding Declaration

Python uses UTF-8 encoding by default.

# -*- coding: latin-1 -*- : Indicates that the declaration document is LATIN-1 encoded.

Help function

Sys.platform  # ' Win32 ' Sys.getdefaultencoding () # ' Utf-8 ' sys.byteorder  # ' little ' S.isalnum ()  # s represents the String S.isalpha () S.isdecimals.isdigit () s.isnumeric () s.isprintable () S.isspace () S.isidentifier () #如果字符串可以用作变量名, Then return Trues.islower () S.isupper () S.istitle ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.