A method for realizing the mutual transfer of full-width half-width characters in python

Source: Internet
Author: User
Preface

I believe that for each programmer, in the text processing, often encounter the problem of full-width half-width inconsistency. This requires that the program be able to move quickly between the two. Because of the mapping of the full-width half-angle itself, it is not complicated to deal with.

The specific rules are:

Full-width character Unicode encoding from 65281~65374 (hex 0xff01 ~ 0xff5e)

Half-width character Unicode encoding from 33~126 (hex 0x21~ 0x7E)

Space is special, full angle is 12288 (0x3000), half angle is (0x20)

And in addition to empty, full-width/half-width sorted in Unicode encoding is corresponding in order (half-width + 65248 = full-width)

So you can deal with non-whitespace data directly by using +-+, and handle the whitespace separately.

Some of the functions used

The chr() function uses an integer (that is, 0~255) within range (256) to make a parameter that returns a corresponding character.

Just unichr() like it, only the Unicode characters are returned.

A function ord() is a chr() unichr() pairing function of a function or function that returns the corresponding ASCII value, or Unicode value, as an argument, as a character (a string of length 1).

To print the mapping relationship first:

For I in Xrange (33,127): Print I,CHR (i), I+65248,UNICHR (i+65248)

return results

33! 65281!  "65282" # 65283 #36 $65284 $37% 65285%38 & 65286 &39 ' 65287 ' 40 (65288 (41) 65289) 42 * 65290 *43 + 65291 +44, 65292, 45-65293-46. 65294.  47/65295/48 0 65296 049 1 65297 150 2 65298 251 3 65299 352 4 65300 453 5 65301 554 6 65302 655 7 65303 756 8 65304 857 9 65305 958:65306:59; 65307 < 65308 <61 = 65309 =62 > 65310 >63? 65311?  @ 65312 @65 A 65313 A66 B 65314 B67 C 65315 C68 D 65316 D69 E 65317 E70 F 65318 F71 G 65319 G72 H 65320 H73 I 65321 I74  J 65322 J75 K 65323 K76 L 65324 L77 M 65325 M78 N 65326 N79 O 65327 O80 P 65328 P81 Q 65329 Q82 R 65330 R83 S 65331 S84 T 65332 T85 U 65333 U86 V 65334 V87 W 65335 W88 X 65336 X89 Y 65337 Y90 Z 65338 Z91 [65339 [92 \ 65340\93] 65341]94 ^ 6 5342 ^95 _ 65343 _96 ' 65344 ' a 65345 a98 b 65346 b99 c 65347 c100 d 65348 d101 e 65349 e102 f 65350 f103 g 65351 g104 H 65352 h105 i 65353 i106 j 65354 j107 K 65355 k108 l 65356 l109 m 65357 m110 n 65358 n111 o 65359 o112 p 65360 p113 q 653 q114R 65362 r115 s 65363 s116 t 65364 t117 u 65365 u118 v 65366 v119 w 65367 w120 x 65368 x121 y 65369 y122 z 65370 z123 {653 71 {124 | 65372 |125} 65373}126 ~ 65374 ~

Turn the full angle into half-width:

def full2half (s): n = [] s = S.decode (' Utf-8 ') for char in s:num = Ord (char) if num = = 0x3000:  num = elif 0xff01 & lt;= num <= 0xff5e:  num-= 0xfee0 num = UNICHR (num) n.append (num) return '. Join (N)

Turn half angle to perfect angle:

def half2full (s): n = [] s = S.decode (' Utf-8 ') for char in s:num = char (char) if num = =:  num = 0x3000 elif 0x21 & lt;= num <= 0x7E:  num + = 0xfee0 num = UNICHR (num) n.append (num) return '. Join (N)

The implementation of the above is very simple, but the real situation may not be the same as the character of the conversion, for example, in Chinese articles we expect all the letters and numbers to be converted to half-width, and common punctuation marks unified use of full-width, the above conversion is not appropriate.

Solution, is a custom dictionary.

#!/usr/bin/env python#-*-coding:utf-8-*-fh_space = FHS = ((U "", U ""), "fh_num = FHN = (U" 0 ", u" 0 "), (U" 1 ", U" 1 "),  (U "2", U "2"), (U "3", U "3"), (U "4", U "4"), (U "5", U "5"), (U "6", U "6") (U "7", U "7"), (U "8", U "8"), (U "9", you "9"), Fh_alpha = FHA = (U "a", U "a"), (U "B", U "B"), (U "C", U "C"), (U "D", U "D"), (U "E", U "E"), (U "F", U "F"), (U "G", U "G"), (U "h", U "H"), ( U "i", U "I"), (U "J", U "J"), (U "K", U "K"), (U "L", U "L"), (U "M", U "M"), (U "n", U "n"), (U "O", U "O"), (U "P", U "P"), (U "q", U "Q") ), (U "R", U "R"), (U "s", U "s"), (U "T", U "T"), (U "U", U "U"), (U "v", U "V"), (U "w", U "w"), (U "x", U "x"), (U "y", U "Y"), (U "z", U "Z"), (U "a", U "a"), (U "B", U "B"), (U "C", U "C"), (U "D", U "D"), (U "E", U "E"), (U "F", U "F"), (U "G", U "G"), (U "h", U "H"), (U "  I ", u" I "), (U" J ", U" J "), (U" K ", U" K "), (U" L ", U" L "), (U" M ", U" M "), (U" n ", U" n "), (U" O ", U" O "), (U" P ", U" P "), (U" q ", U" Q "), (U "R", U "R"), (U "s", U "s"), (U "T", U "T"), (U "U", U "U"), (U "v", U "V"), (U "w", U "w"), (U "x", U "x"), (U "y", U "Y"), (U "z", U " Z "),) Fh_punctuation = FHP = ((u). ", U". "), (U", ", U", "), (U"! ", U"! "), (U"?) ", U"? "), (U" "", U ' "'), (U" ' ", U" ' "), (U" ' ", U" ' "), (U" @ ", U" @ "), (U" _ ", U" _ "), (U": ", U": "), (You"; ", You"; "), (U" # ", U" # "),  (U "$", U "$"), (U "%", U "%"), (U "&", U "&"), (U "(", U "("), (U ")", U ")"), (U "-", U "-"), (U "=", u "="), (U "*", U "*"), (U "+", U "+"), (U "-", U "-"), (U "/", U "/"), (U "", U "<"), (U "", "U", ">"), (U "[", U "["), (U "¥", U "\"), (U "]", U "]"), (U "^", u " ^ "), (U" {", U" {"), (U" | ", U" | "), (U"} ", U"} "), (U" ~ ", U" ~ "), Fh_ascii = HAC = Lambda: ((fr, to) for M in (Fh_alpha, Fh_num , fh_punctuation) for FR, into m) hf_space = HFS = ((U "", U ""), Hf_num = HFN = Lambda: ((h, z) for Z, h in fh_num) Hf_al PHA = HFA = Lambda: ((h, z) for Z, H-fh_alpha) hf_punctuation = HFP = Lambda: ((h, z) for Z, h in fh_punctuation) Hf_asci  I = ZAC = Lambda: ((h, z) for Z, h in Fh_ascii ()) def convert (text, *maps, **ops): "" "full-width/half-width conversion args:text:unicode string Need to convert maps:conversion maps skip:skip out of character. In a tuple or string return:converted Unicode string "" "If" skip "in Ops:skip = ops[" Skip "] if Isinstance (Skip, basestring): skip = tuple (skip)  def replace (text, FR, to): return text if FR in skip else Text.replace (fr, to) else:def replace (text, FR, to): return  Text.replace (FR, to) for M in Maps:if callable (m): M = m () elif isinstance (M, dict): M = M.items () for FR, to in M: Text = replace (text, FR, to) return text if __name__ = = ' __main__ ': Text = u "Narita Airport-" JR Express Narita エクスプレス/Yokohama line, 2 Station "-Tokyo-" JR New Line はやぶさ No. • New Qingsen, 6 Station "-Shin Qingsen-" JR Limited Express スーパー Bird No. 4 Station "-Library" print convert (text, fh_ascii, {u "" ": U" [", U" "": U "]", U ",": U ",", U ".": U ". ", U"? ": U"? ", U"! ": U"! "}, Spit=",.?! “”")

Special NOTE: quotation marks in the English system are not distinguished from the front and back quotation marks.

Summarize

The above is about Python to achieve full-width half-width character mutual transfer method, I hope that the content of this article on everyone's study or work can bring certain help, if there is doubt you can message exchange.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.