Detailed description of the ASCII mode using regular expressions in Python, pythonascii
ASCII
ASCII (American Standard Code for Information Interchange) is a single-byte encoding. In the computer world, only English is used at the beginning, and a single byte can represent 256 different characters, which can represent all English characters and many control symbols. However, ASCII only uses half of them (less than \ x80), which is also the basis for implementation of MBCS.
Currently, python3 is basically used for development, but sometimes, to be compatible with the old python2 code, the regular expression mainly uses different string representation methods, in python3, Unicode is used to represent strings and regular expressions, while in python2, ASCII is used for representation. So how can I set it in python3 to be compatible with the old method? In fact, it can be solved through the ASCII mark, as shown in the following example:
# Python 3.6 # Cai junsheng # http://blog.csdn.net/caimouse/article/details/51749579 # import re text = u'francebar z export oty österreich 'pattern = R' \ w + 'ascii_pattern = re. compile (pattern, re. ASCII) unicode_pattern = re. compile (pattern) print ('text: ', Text) print ('pattern:', Pattern) print ('ascii: ', list (ascii_pattern.findall (text ))) print ('unicode: ', list (unicode_pattern.findall (text )))
The output is as follows:
Text : Français złoty ÖsterreichPattern : \w+ASCII : ['Fran', 'ais', 'z', 'oty', 'sterreich']Unicode : ['Français', 'złoty', 'Österreich']
Summary
The above is a detailed description of the ASCII mode using regular expressions in Python. I hope it will be helpful to you. If you have any questions, please leave a message for me, the editor will reply to you in a timely manner. Thank you very much for your support for the help House website!