Python enables the conversion of full-width half-width to each other

Source: Internet
Author: User
Tags ord ustring

In the process of natural language processing, the inconsistency of the full-width and half-angle causes the information extraction to be inconsistent and therefore needs to be unified.

Conversion instructions

Full-width Half-width conversion description

Regular (with no spaces):

Full-width character Unicode encoding from 65281~65374 (hex 0xff01 ~ 0xff5e)
Half-width character Unicode encoding from 33~126 (hex 0x21~ 0x7E)

Exception:

Space is special, full angle is 12288 (0x3000), half angle is (0x20)

In addition to empty, full-width/half-width sorting in Unicode is corresponding in order (half-width + 0x7e= full-width), so it is possible to process non-whitespace data directly by using + + method.

Note:

1. Chinese text is always full-width, only the English alphabet, number keys, symbol keys have the concept of full-width half-angle, a letter or number of the position of a Chinese character called full-width, half of the position of the Chinese character called half-angle.

2. Quotation marks are different in both Chinese and English and full-width cases

 

Reference Code

#-*-coding:cp936-*-defstrq2b (ustring):"""full angle turning half angle"""rstring=""     forUcharinchUstring:inside_code=Ord (Uchar)ifInside_code = = 12288:#full-width space direct conversionInside_code = 32elif(Inside_code >= 65281 andInside_code <= 65374):#Full-width characters (except spaces) are converted according to the relationshipInside_code-= 65248rstring+=UNICHR (Inside_code)returnrstringdefstrb2q (ustring):"""half angle turn full angle"""rstring=""     forUcharinchUstring:inside_code=Ord (Uchar)ifInside_code = = 32:#half-width space direct conversionInside_code = 12288elifInside_code >= 32 andInside_code <= 126:#Half-width characters (except spaces) are converted according to the relationshipInside_code + = 65248rstring+=UNICHR (Inside_code)returnRSTRINGB= STRQ2B ("MN123ABC Blog Park". Decode ('cp936'))                           PrintBC= strb2q ("MN123ABC Blog Park". Decode ('cp936'))                           PrintC

Execution results

 

Library function Description

The CHR () function returns a corresponding character by using an integer (that is, 0~255) within range (256) as a parameter.
UNICHR () is just like it, except that it returns a Unicode character.

The Ord () function is a matching function for the Chr () function (for 8-bit ASCII strings) or the UNICHR () function (for Unicode objects), which returns the corresponding ASCII value, or Unicode value, as an argument with a character (a string of length 1).

Case

 

Python enables the conversion of full-width half-width to each other

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.