Python3 Regular Expressions (3) and python3 Regular Expressions
Previous Article: Explanation of Python3 Regular Expression (2)
Https://docs.python.org/3.4/howto/regex.html
The blogger made some comments and modifications to this question ^_^
FlagDescriptionASCII,So that the escape characters \ w, \ B, \ s and \ d can only match ASCII charactersDOTALL, SMatch any symbols, including line breaks.IGNORECASE, ICase Insensitive during MatchingLOCALE, LSupports current language (region) settingsMULTILNE, MMulti-row matching, affecting ^ and $VERBOSE, X (for 'extended ')Enable detailed Regular Expressions
The following describes their meanings in detail:
A
ASCII
Make \ w, \ W, \ B, \ B, \ s and \ S match only ASCII characters, but not the complete Unicode characters. This flag only makes sense for the Unicode mode and ignores the byte mode.
S
DOTALL
So that. can match any character, including line breaks. If this flag is not used, all characters except the line feed will be matched.
I
IGNORECASE
Character classes and text strings are not case sensitive during matching. For example, the regular expression [A-Z] will also match the corresponding lowercase letters, such as Fanfan can match Fanfan, fanfan, or FANFAN. If you do not set LOCALE, you will not consider the case sensitivity of language (region) settings.
L
LOCALE
Make \ w, \ W, \ B and \ B dependent on the current language (region) environment, rather than the Unicode database.
Region settings are a function of the C language and mainly used to eliminate differences between different languages. For example, if you are processing French text, you want to use \ w + to match words, but \ w only matches words in [A-Za-z, it does not match the special characters in French. If your system correctly sets the French region environment, the C-language function will tell the program that the special symbols should also be considered a character. When the LOCALE flag is set during regular expression compilation, \ w + can recognize French, but the speed will be affected.
M
MULTILNE
(^ And $ we haven't mentioned it yet. Don't worry. Let's talk about it later ...)
Generally, ^ matches only the beginning of the string, while $ matches the end of the string. When this flag is set, ^ not only matches the start of the string, but also the beginning of each line. $ not only matches the end of the character, but also the end of each line.
X
VERBOSE
This sign enables your regular expression to look better and be more organized, because this sign is used, spaces are ignored (except for spaces that appear in character classes and escape using backslashes). This flag also allows you to comment on a regular expression string. # The content behind the symbol is a comment, it will not be submitted to the matching engine (except for the # that appears in the character class and uses the backslash escape #).
Below is an example of using re. VERBOSE. We can see whether the readability of Regular Expressions has improved a lot:
Charref = re. compile (r "& [#] # Start digit reference (0 [0-7] + # octal format | [0-9] + # decimal format | x [0-9a-fA-F] + # hexadecimal format ); # end with a semicolon "", re. VERBOSE)
If the VERBOSE flag is not set, the same regular expression is written as follows:
charref = re.compile("&#(0[0-7]+|[0-9]+|x[0-9a-fA-F]+);")
Note: Which one is more readable? I believe everyone has a bottom in mind.
(This article is complete)
Next article: Explanation of Python3 Regular Expressions (4)
If you like this article, please use the "Comments" below to encourage me. ^_^