Python method for calculating the width of a character

Last Update:2016-06-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The example in this article describes how Python calculates the width of a character. Share to everyone for your reference, as follows:

A CLI applet has recently been written in Python, which involves calculating the width of a character, with the goal of intercepting a long string in a friendly manner as a fragment of equal width.

For Unicode characters, the Len function of Python can accurately calculate the number of characters it contains, but the number does not represent the width, such as:

>>>len (U ' Hello A ') 3

Therefore, it is not easy to use this method to calculate the width.

GBK Decode

First I think of GBK encoding, the 00–7f range of characters is a byte encoding, the rest is a double-byte encoding, exactly the width of the character is roughly the same, so there is such an opportunistic approach (assuming 8 width):

>>> a = U ' Hello Hi ' >>> b=a.encode (' GBK ') >>> try:  ... Print B[:8].decode (' GBK ') ... except:  ... Print B[:7].decode (' GBK ') ... hello you

As shown in the code, the Unicode string is GBK encoded first, then the width of the 8 bytes is intercepted and then attempted to decode with GBK, if the decoding fails, then a width is truncated, and a decoding of 7 bytes is used GBK.

Although the problem was initially solved, the mishap was obvious. First, the code is not elegant, in a trial-and-error manner, followed by GBK can represent a limited number of characters, for a lot of characters other than GBK can not be supported.

East_asian_width

After wandering for a long time, I stumbled upon the East_asian_width attribute in the Unicode Character Database standard with the following possible values:

# East_asian_width (EA) EA; A     ; Ambiguous  not sure EA; F     ; Fullwidth  full width ea; H     ; Halfwidth  half-width ea; N     ; Neutral   neutral ea; Na    ; Narrow    Narrow ea; W     ; Wide     Width

In addition to a uncertainty, f/h/n/na/w can clearly know the width, if conservative, a as a width of 2, it is easy to give the width of a single character:

>>> Import unicodedata>>> def chr_width (c): ...  if (Unicodedata.east_asian_width (c) in (' F ', ' W ', ' A ')):   ... Return 2  ... else:   ... return 1>>> chr_width (U ' you ') 2>>> chr_width (U ' a ') 1

Now seems to be able to meet the requirements, but the actual use of the attribute is found to be a character is very much see, the most typical is the Chinese double quotation marks:

>>> chr_width (U ' "') 2

In most of the equal-width font, the Chinese double quotation marks are only one wide, if there are multiple Chinese double quotation marks in a row, then the accumulated false judgment width will make the interception effect greatly discounted, no doubt this is not the best way.

Urwid Solutions

Urwid is a mature Python terminal UI library that wraps HTML-like controls on curses to display textual content, and if there is a development requirement, this library is much more convenient than using the curses library directly. Very good is it on the Unicode text width interception is very accurate, let me greatly surprised, so opened its source code to explore, the text width calculation of its core codes are as follows:

widths = [  (126,  1), (159,  0), (687,   1), (710,  0)  , (711, 1), (727, 0), (733, 1), ( 879,   0), (1154, 1), (1161, 0), (  4347,  1),  (4447, 2), (7467, 1), (7521, 0), (8369, 1), (  8426 ,  0), (9000,  1), (9002,  2), (11021, 1), (12350, 2), (12351, 1), (12438, 2), (12442, 0), (  19893, 2 ), (19967, 1),  (55203, 2), (63743, 1), (64106,  2), (65039, 1), (65059, 0), (65131, 2), (65279, 1), (  65376,< c23/>2), (65500, 1), (65510, 2), (  120831, 1), (262141, 2), (1114109, 1),]def get_width (o): "" "  Return the Scree n column width for Unicode ordinal o.  "" " Global widths  if o = = 0xe or o = = 0xf:    return 0  for num, wid in widths:    if o <= num:      return WI D  return 1

As shown in the code, first sort out the range table of the character widths according to the official Unicode Eastasianwidth document, and then use the Unicode code to look up the table. Use the previous example to test:

>>> Get_width (Ord (U ' A ')) 1>>> get_width (Ord (U ' You ')) 2>>> get_width (Ord (U ' "")) 1

Completely accurate, and in the actual application of the performance is also relatively good, is an ideal solution, more tips please refer to Urwid old_str_util.py source code.

More interested in Python related content readers can view this site topic: "Python Picture Operation skills Summary", "Python data structure and algorithm tutorial", "Python Socket Programming Skills Summary", "Python function Use Tips", " Python string manipulation Tips Summary, Python Introductory and Advanced classic tutorials, and Python file and directory operations tips

I hope this article is helpful for Python program design.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python method for calculating the width of a character

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python method for calculating the width of a character

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support