Python implements full-width and half-width conversion, and python full-width and half-width characters

Last Update:2016-12-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

I believe that for every programmer, the full-width and half-width inconsistency often occurs during text processing. So the program needs to be able to quickly switch between the two. Since the full-width half-width has a ing relationship, the processing is not complicated.

The specific rules are as follows:

Full-width unicode encoding from 65281 ~ 65374 (hexadecimal 0xFF01 ~ 0xFF5E)

The unicode encoding of halfwidth characters ranges from 33 ~ 126 (hexadecimal 0x21 ~ 0x7E)

Space is special. The full angle is 12288 (0x3000), and the half angle is 32 (0x20)

Besides spaces, the full/half-width values are sorted in unicode encoding order (half-width + 65248 = full-width)

Therefore, you can use the +-method to process non-space data and separate spaces.

Some functions used

chr()The function uses a range (0 ~) in the range (256 ~ 255) an integer is used as a parameter and a corresponding character is returned.

unichr()Like it, only Unicode characters are returned.

ord()Function ischr()Function orunichr()The pairing function of a function. It takes a character (string with a length of 1) as the parameter and returns the corresponding ASCII value or Unicode value.

Print the ing first:

for i in xrange(33,127): print i,chr(i),i+65248,unichr(i+65248)

Returned results

33 ! 65281 ！34 " 65282 ＂35 # 65283 ＃36 $ 65284 ＄37 % 65285 ％38 & 65286 ＆39 ' 65287 ＇40 ( 65288 （41 ) 65289 ）42 * 65290 ＊43 + 65291 ＋44 , 65292 ，45 - 65293 －46 . 65294 ．47 / 65295 ／48 0 65296 ０49 1 65297 １50 2 65298 ２51 3 65299 ３52 4 65300 ４53 5 65301 ５54 6 65302 ６55 7 65303 ７56 8 65304 ８57 9 65305 ９58 : 65306 ：59 ; 65307 ；60 < 65308 ＜61 = 65309 ＝62 > 65310 ＞63 ? 65311 ？64 @ 65312 ＠65 A 65313 Ａ66 B 65314 Ｂ67 C 65315 Ｃ68 D 65316 Ｄ69 E 65317 Ｅ70 F 65318 Ｆ71 G 65319 Ｇ72 H 65320 Ｈ73 I 65321 Ｉ74 J 65322 Ｊ75 K 65323 Ｋ76 L 65324 Ｌ77 M 65325 Ｍ78 N 65326 Ｎ79 O 65327 Ｏ80 P 65328 Ｐ81 Q 65329 Ｑ82 R 65330 Ｒ83 S 65331 Ｓ84 T 65332 Ｔ85 U 65333 Ｕ86 V 65334 Ｖ87 W 65335 Ｗ88 X 65336 Ｘ89 Y 65337 Ｙ90 Z 65338 Ｚ91 [ 65339 ［92 \ 65340 ＼93 ] 65341 ］94 ^ 65342 ＾95 _ 65343 ＿96 ` 65344 ｀97 a 65345 ａ98 b 65346 ｂ99 c 65347 ｃ100 d 65348 ｄ101 e 65349 ｅ102 f 65350 ｆ103 g 65351 ｇ104 h 65352 ｈ105 i 65353 ｉ106 j 65354 ｊ107 k 65355 ｋ108 l 65356 ｌ109 m 65357 ｍ110 n 65358 ｎ111 o 65359 ｏ112 p 65360 ｐ113 q 65361 ｑ114 r 65362 ｒ115 s 65363 ｓ116 t 65364 ｔ117 u 65365 ｕ118 v 65366 ｖ119 w 65367 ｗ120 x 65368 ｘ121 y 65369 ｙ122 z 65370 ｚ123 { 65371 ｛124 | 65372 ｜125 } 65373 ｝126 ~ 65374 ～

Convert the fullwidth to halfwidth:

def full2half(s): n = [] s = s.decode('utf-8') for char in s: num = ord(char) if num == 0x3000:  num = 32 elif 0xFF01 <= num <= 0xFF5E:  num -= 0xfee0 num = unichr(num) n.append(num)return ''.join(n)

Convert the halfwidth to fullwidth:

def half2full(s): n = [] s = s.decode('utf-8') for char in s: num = char(char) if num == 320:  num = 0x3000 elif 0x21 <= num <= 0x7E:  num += 0xfee0 num = unichr(num) n.append(num)return ''.join(n)

The above implementation method is very simple, but in reality, it may not convert all the characters, for example, in a Chinese document, we expect to convert all the letters and numbers to half-width characters, while the common punctuation marks use the full-width characters in a unified manner. The conversion above is not suitable.

The solution is a custom dictionary.

#! /Usr/bin/env python #-*-coding: UTF-8-*-FH_SPACE = FHS = (u "", u ""),) FH_NUM = FHN = (u "0", u "0"), (u "1", u "1"), (u "2 ", u "2"), (u "3", u "3"), (u "4", u "4"), (u "5 ", u "5"), (u "6", u "6"), (u "7", u "7"), (u "8 ", u "8"), (u "9", u "9"),) FH_ALPHA = FHA = (u "a", u ""), (u "B", u "B"), (u "c", u "c"), (u "d", u "d "), (u "e", u "e"), (u "f", u "f"), (u "g", u "g "), (u "h", u "h"), (u "I", u "I"), (u "j", u "j "), (u "k", u "k "), (U" l ", u" l "), (u" m ", u" m "), (u" n ", u" n "), (u "o", u "o"), (u "p", u "p"), (u "q", u "q "), (u "r", u "r"), (u "s", u "s"), (u "t", u "t "), (u "u", u "u"), (u "v", u "v"), (u "w", u "w "), (u "x", u "x"), (u "y", u "y"), (u "z", u "z "), (u "A", u "A"), (u "B", u "B"), (u "C", u "C "), (u "D", u "D"), (u "E", u "E"), (u "F", u "F "), (u "G", u "G"), (u "H", u "H"), (u "I", u "I "), (u "J", u "J"), (u "K", u "K"), (u "L", u "L "), (u "M", u "M"), (u "N ", U" N "), (u" O ", u" O "), (u" P ", u" P "), (u" Q ", u "Q"), (u "R", u "R"), (u "S", u "S"), (u "T ", u "T"), (u "U", u "U"), (u "V", u "V"), (u "W ", u "W"), (u "X", u "X"), (u "Y", u "Y"), (u "Z ", u "Z"),) FH_PUNCTUATION = FHP = (u ". ", u ". "), (u", ", u", "), (u "! ", U "! "), (U "? ", U "? "), (U", U' "'), (u"' ", u" '"), (u"' ", u "'"), (u "@", u "@"), (u "_", u "_"), (u ":", u ":"), (u "; ", u"; "), (u" # ", u" # "), (u" $ ", u" $ "), (u" % ", u "%"), (u "&", u "&"), (u "(", u "("), (u ")", u ") "), (u"-", u"-"), (u" = ", u" = "), (u" * ", u "*"), (u "+", u "+"), (u "-", u "-"), (u "/", u "/"), (u "<", u "<"), (u ">", u ">"), (u "[", u "["), (u "￥", u "\"), (u "]", u "]"), (u "^", u "^ "), (u "{", u "{"), (u "|", u "|"), (u "}", u "}"), (u "~ ", U "~ "),) FH_ASCII = HAC = lambda: (fr, to) for m in (FH_ALPHA, FH_NUM, FH_PUNCTUATION) for fr, to in m) HF_SPACE = HFS = (u "", u ""),) HF_NUM = HFN = lambda: (h, z) for z, h in FH_NUM) HF_ALPHA = HFA = lambda: (h, z) for z, h in FH_ALPHA) HF_PUNCTUATION = HFP = lambda: (h, z) for z, h in FH_PUNCTUATION) HF_ASCII = ZAC = lambda: (h, z) for z, h in FH_ASCII () def convert (text, * maps, ** ops): "fullwidth/halfwidth Convert args: text: unicode string need to convert maps: conversion maps skip: skip out of character. in a tuple or string return: converted unicode string "if" skip "in ops: skip = ops [" skip "] if isinstance (skip, basestring ): skip = tuple (skip) def replace (text, fr, to): return text if fr in skip else text. replace (fr, to) else: def replace (text, fr, to): return text. replace (fr, to) for m in maps: I F callable (m): m = m () elif isinstance (m, dict): m = m. items () for fr, to in m: text = replace (text, fr, to) return text if _ name _ = '_ main __': text = u "Narita Airport-[JR token Narita region, region, site 2]-Dongjing-[JR shinect, Beijing, Beijing, station 6]-xin qingsen-[JR., station 4]- "print convert (text, FH_ASCII, {u "【": u "[", u "]": u "]", u ",": u ",", u ". ": u ". ", U "? ": U "? ", U "! ": U "! "}, Spit = ",.?! "")

Note:In the English system, quotation marks are not distinguished between the quotation marks and the quotation marks.

Summary

The above describes how to implement full-width and half-width conversion in Python. I hope the content in this article will help you in your study or work. If you have any questions, please leave a message.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python implements full-width and half-width conversion, and python full-width and half-width characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python implements full-width and half-width conversion, and python full-width and half-width characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support