Python3 How to use regular expressions gracefully (detailed three)

Source: Internet
Author: User
Tags character classes locale

Module-level functions

Using a regular expression is not necessarily a way to create a schema object and then call its matching method. Because the RE module also provides some global functions, such as match (), search (), FindAll (), Sub (), and so on. The first parameter of these functions is a regular expression string, and the other parameter takes the same parameters as a method with the same name as the schema object, and the return value is the same, returning None or matching objects.

    1. >>> Print (Re.match (R ' from\s+ ', ' from_fishc.com '))
    2. None
    3. >>> Re.match (R ' from\s+ ', ' from fishc.com ')
    4. <_sre. Sre_match object; Span= (0, 5), match= ' from ' >
Copy Code


In fact, these functions just help you to automatically create a schema object, and call the relevant function (the contents of the previous article, remember?). )。 They also store compiled schema objects in the cache so that they can be called directly in the future.


So should we just use these module-level functions, or do we just compile a schema object and then call the Schema object method? This depends on how often the regular expression is used, and if our program uses regular expressions only occasionally, then the global function is more convenient, and if our program uses a lot of regular expressions (for example, in a loop), it is recommended that you use the latter method, Some function calls can be saved because of precompilation. But if it is outside the loop, the efficiency of the two is comparable because of the internal caching mechanism.


compile flag

Compile flags let you modify how regular expressions work. Under the RE module, the compilation flag has two names: full name and shorthand, for example IGNORECASE is I (if you are a fan of Perl, then you are blessed because these abbreviations are the same as Perl, for example, re.) The shorthand for VERBOSE is re. X). In addition, multiple flags can be used simultaneously (through the "|" ), such as: Re. I | Re. M is to set both the I and M flags.

Below is a list of supported compilation flags:

Sign Meaning
ASCII, A Enables escaped symbols such as \w,\b,\s , and \d to match only ASCII characters
Dotall, S Make . match any symbol, including line breaks
IGNORECASE, I Match is case insensitive
LOCALE, L Support for current language (locale) settings
MULTILINE, M Multiline match, affecting ^ and $
VERBOSE, X (for ' extended ') To enable verbose regular expressions



Let's take a look at what they mean in detail:

A
ASCII
Makes\w,\w,\b,\b,\sAnd\sMatches only ASCII characters and does not match full Unicode characters. This flag is only meaningful for Unicode mode, and ignores byte patterns.

S
Dotall
Makes.Can match any character, including line breaks. If you do not use this flag,.All characters except the line break will be matched.

I
IGNORECASE
Character classes and literal strings are not case-sensitive when matched. For example, a regular expression[A-z]will also match the corresponding lowercase letters, likeFISHCCan matchFISHC,FISHCOrFISHCsuch as If you do not set locale, the case of the language (locale) setting is not considered.

L
LOCALE
Makes\w,\w,\bAnd\bRelies on the current language (region) environment, not the Unicode database.

Locale is a function of C language, the main role is to eliminate differences between different languages. For example you are working on a French text that you want to use\w+To match the word, but\wJust a match.[A-za-z]The words in, and does not match' é 'Or' C '。 If your system correctly sets up the French locale, the C function will tell the program' é 'Or' C 'should also be considered a character. When compiling regular expressions, the locale flag is set.\w+will be able to identify the French, but the speed will be affected.

M
MULTILINE
(^And$We have not mentioned, do not worry, behind we have a detailed talk about ... )

Usually^Matches only the beginning of the string, and$Matches the end of the string. When this flag is set,^Matches not only the beginning of the string, but also the beginning of each line;&Matches not only the end of the string, but also the end of each line.

X
VERBOSE
This flag allows your regular expression to be written better and more organized, because using this flag, the spaces are ignored (except for the spaces that appear in the character class and are escaped with backslashes); This flag also allows you to use annotations in the regular expression string,#The content behind the symbol is a comment and will not be submitted to the matching engine (except for the characters that appear in the character class and are escaped with a backslash).#)。

Below is the use of RE. VERBOSE example, you see the readability of the regular expression is not improved a lot:

    1. Charref = Re.compile (r "" "
    2. &[#] # Start a numeric reference
    3. (
    4. 0[0-7]+ # octal format
    5. | [0-9]+ # decimal format
    6. | x[0-9a-fa-f]+ # hexadecimal format
    7. )
    8. ; # ending Semicolon
    9. "" ", Re. VERBOSE)
Copy Code


If the VERBOSE flag is not set, the same regular expression will be written as:

    1. Charref = Re.compile ("0[0-7]+|[ 0-9]+|x[0-9a-fa-f]+); ")
Copy Code


Which is more readable? I believe we have the bottom of our hearts.

Python3 How to use regular expressions gracefully (detailed three)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.