The example in this article describes how Python calculates the width of a character. Share to everyone for your reference, as follows:
A CLI applet has recently been written in Python, which involves calculating the width of a character, with the goal of intercepting a long string in a friendly manner as a fragment of equal width.
For Unicode characters, the Len function of Python can accurately calculate the number of characters it contains, but the number does not represent the width, such as:
>>>len (U ' Hello A ') 3
Therefore, it is not easy to use this method to calculate the width.
GBK Decode
First I think of GBK encoding, the 00–7f range of characters is a byte encoding, the rest is a double-byte encoding, exactly the width of the character is roughly the same, so there is such an opportunistic approach (assuming 8 width):
>>> a = U ' Hello Hi ' >>> b=a.encode (' GBK ') >>> try: ... Print B[:8].decode (' GBK ') ... except: ... Print B[:7].decode (' GBK ') ... hello you
As shown in the code, the Unicode string is GBK encoded first, then the width of the 8 bytes is intercepted and then attempted to decode with GBK, if the decoding fails, then a width is truncated, and a decoding of 7 bytes is used GBK.
Although the problem was initially solved, the mishap was obvious. First, the code is not elegant, in a trial-and-error manner, followed by GBK can represent a limited number of characters, for a lot of characters other than GBK can not be supported.
East_asian_width
After wandering for a long time, I stumbled upon the East_asian_width attribute in the Unicode Character Database standard with the following possible values:
# East_asian_width (EA) EA; A ; Ambiguous not sure EA; F ; Fullwidth full width ea; H ; Halfwidth half-width ea; N ; Neutral neutral ea; Na ; Narrow Narrow ea; W ; Wide Width
In addition to a uncertainty, f/h/n/na/w can clearly know the width, if conservative, a as a width of 2, it is easy to give the width of a single character:
>>> Import unicodedata>>> def chr_width (c): ... if (Unicodedata.east_asian_width (c) in (' F ', ' W ', ' A ')): ... Return 2 ... else: ... return 1>>> chr_width (U ' you ') 2>>> chr_width (U ' a ') 1
Now seems to be able to meet the requirements, but the actual use of the attribute is found to be a character is very much see, the most typical is the Chinese double quotation marks:
>>> chr_width (U ' "') 2
In most of the equal-width font, the Chinese double quotation marks are only one wide, if there are multiple Chinese double quotation marks in a row, then the accumulated false judgment width will make the interception effect greatly discounted, no doubt this is not the best way.
Urwid Solutions
Urwid is a mature Python terminal UI library that wraps HTML-like controls on curses to display textual content, and if there is a development requirement, this library is much more convenient than using the curses library directly. Very good is it on the Unicode text width interception is very accurate, let me greatly surprised, so opened its source code to explore, the text width calculation of its core codes are as follows:
widths = [ (126, 1), (159, 0), (687, 1), (710, 0) , (711, 1), (727, 0), (733, 1), ( 879, 0), (1154, 1), (1161, 0), ( 4347, 1), (4447, 2), (7467, 1), (7521, 0), (8369, 1), ( 8426 , 0), (9000, 1), (9002, 2), (11021, 1), (12350, 2), (12351, 1), (12438, 2), (12442, 0), ( 19893, 2 ), (19967, 1), (55203, 2), (63743, 1), (64106, 2), (65039, 1), (65059, 0), (65131, 2), (65279, 1), ( 65376,< c23/>2), (65500, 1), (65510, 2), ( 120831, 1), (262141, 2), (1114109, 1),]def get_width (o): "" " Return the Scree n column width for Unicode ordinal o. "" " Global widths if o = = 0xe or o = = 0xf: return 0 for num, wid in widths: if o <= num: return WI D return 1
As shown in the code, first sort out the range table of the character widths according to the official Unicode Eastasianwidth document, and then use the Unicode code to look up the table. Use the previous example to test:
>>> Get_width (Ord (U ' A ')) 1>>> get_width (Ord (U ' You ')) 2>>> get_width (Ord (U ' "")) 1
Completely accurate, and in the actual application of the performance is also relatively good, is an ideal solution, more tips please refer to Urwid old_str_util.py source code.
More interested in Python related content readers can view this site topic: "Python Picture Operation skills Summary", "Python data structure and algorithm tutorial", "Python Socket Programming Skills Summary", "Python function Use Tips", " Python string manipulation Tips Summary, Python Introductory and Advanced classic tutorials, and Python file and directory operations tips
I hope this article is helpful for Python program design.