Use compile acceleration
compile (rule [, flag])
- Compiles regular rules into a Pattern object for the next use.
- The first parameter is regular, and the second parameter is the rule option.
- Returns a Pattern object
- Use findall (rule, target) directly to match the string, two times a day nothing, if it is repeated use, because the regular engine each time the rules to explain the rule, and the interpretation of the rules is quite time-consuming, so the efficiency is very low. If you want to use the same rule more than once to make a match, you can use the re.compile function to precompile the rule, using the Regular Expression Object or the Pattern that was compiled to return object to be searched.
- Cases
- >>> s= ' 111,222,aaa,bbb,ccc333,444ddd '
- >>> rule=r ' \b\d+\b '
- >>> compiled_rule=re.compile (rule)
- >>> Compiled_rule.findall (s)
- [' 111 ', ' 222 ']
- It is seen that using compile rules is similar to using non-compiled uses. The compile function can also specify some rule flags to specify some special options. Multiple options with '|' (bit or) to connect together.
- I IGNORECASE ignore case differences.
- L LOCAL Character Set localization. This feature is designed to support multiple language versions of the character set using the environment, such as the escape character \w, which stands for [a-za-z0-9] in English, that is, so English characters and numbers. If used in a French environment, the default setting does not match "é " or " C". Plus this L option and you can match it. However, this does not seem to work for the Chinese environment, it still does not match the characters.
- M MULTILINE multi-line matching. In this mode ' ^ ' ( representing the beginning of the string ) and ' $ ' ( representing the end of the string ) will be able to match the case of multiple lines, becoming the beginning and end of the line mark. Like what
- >>> s= ' 123 456\n789 012\n345 678 '
- >>> rc=re.compile (R ' ^\d+ ') # matches a number at the beginning without using the M option
- >>> Rc.findall (s)
- [' 123 '] # results can only be found at the first beginning of the ' 123 '
- >>> rcm=re.compile (R ' ^\d+ ', re. m) # using the m option
- >>> Rcm.findall (s)
- [' 123 ', ' 789 ', ' 345 '] # found three numbers at the beginning of the line
- Similarly, for ' $ ' , without the M option, it will match the last line at the end of the number, i.e. ' 678 ', plus later, it will be able to match three end of the number 456 012 and 678 up .
- >>> rc=re.compile (R ' \d+$ ')
- >>> rcm=re.compile (R ' \d+$ ', re. M
- >>> Rc.findall (s)
- [' 678 ']
- >>> Rcm.findall (s)
- [' 456 ', ' 012 ', ' 678 ']
- S Dotall '. ' Will match all the characters. By default '. ' Match all characters except the newline character ' \ n ' , after using this option,'. ' Can match any character that includes ' \ n ' .
- U unicode \w , \w \b \b \d \d \s and will use Unicode.
- X VERBOSE This option ignores whitespace in the regular expression and allows you to use ' # ' to guide a comment. This will allow you to write the rules more beautifully. Like you can put the rules
- >>> rc = Re.compile (r "\d+|[ a-za-z]+ ") #匹配一个数字或者单词
- Use the X option to write:
- >>> rc = Re.compile (R "" "# Start a rule
- \d+ #
- | [A-za-z]+ # Word
- "" ", Re. VERBOSE)
- In this mode, if you want to match a space, you must use the form ' \ ' (followed by a space)
Python Regular Expression _re module _ using compile acceleration