Python enables the conversion of full-width half-width to each other

In the process of natural language processing, the inconsistency of the full-width and half-angle causes the information extraction to be inconsistent and therefore needs to be unified.

Conversion instructions

Full-width Half-width conversion description

Regular (with no spaces):

Full-width character Unicode encoding from 65281~65374 (hex 0xff01 ~ 0xff5e)
Half-width character Unicode encoding from 33~126 (hex 0x21~ 0x7E)


Space is special, full angle is 12288 (0x3000), half angle is (0x20)

In addition to empty, full-width/half-width sorting in Unicode is corresponding in order (half-width + 0x7e= full-width), so it is possible to process non-whitespace data directly by using + + method.


1. Chinese text is always full-width, only the English alphabet, number keys, symbol keys have the concept of full-width half-angle, a letter or number of the position of a Chinese character called full-width, half of the position of the Chinese character called half-angle.

2. Quotation marks are different in both Chinese and English and full-width cases


Reference Code

#-*-coding:cp936-*-defstrq2b (ustring):"""full angle turning half angle"""rstring=""     forUcharinchUstring:inside_code=Ord (Uchar)ifInside_code = = 12288:#full-width space direct conversionInside_code = 32elif(Inside_code >= 65281 andInside_code <= 65374):#Full-width characters (except spaces) are converted according to the relationshipInside_code-= 65248rstring+=UNICHR (Inside_code)returnrstringdefstrb2q (ustring):"""half angle turn full angle"""rstring=""     forUcharinchUstring:inside_code=Ord (Uchar)ifInside_code = = 32:#half-width space direct conversionInside_code = 12288elifInside_code >= 32 andInside_code <= 126:#Half-width characters (except spaces) are converted according to the relationshipInside_code + = 65248rstring+=UNICHR (Inside_code)returnRSTRINGB= STRQ2B ("MN123ABC Blog Park". Decode ('cp936'))                           PrintBC= strb2q ("MN123ABC Blog Park". Decode ('cp936'))                           PrintC

Execution results


Library function Description

The CHR () function returns a corresponding character by using an integer (that is, 0~255) within range (256) as a parameter.
UNICHR () is just like it, except that it returns a Unicode character.

The Ord () function is a matching function for the Chr () function (for 8-bit ASCII strings) or the UNICHR () function (for Unicode objects), which returns the corresponding ASCII value, or Unicode value, as an argument with a character (a string of length 1).



