First, the basic concept of the regular
1. General operation of Strings
Test File # Imooc.txt
Imooc JAVAIMOOC HTMLIMOOC Python Imooccc#go
Querying for specific characters
# string_find.py
def FIND_START_IMOOC (fname): f = open (fname) for line in F: if Line.startswith (' Imooc '): print (line) FIND_START_IMOOC (' Imooc.txt ') def FIND_IN_IMOOC (fname): f = open (fname) for line in F: if Line.startswith ( ' Imooc ') and line[:-1].endswith (' Imooc '): print (line) Find_in_imooc (' Imooc.txt ')
Output
Imooc Javaimooc Htmlimooc python imoocimooc python Imooc
2. Using the regular
Like the above, every time you need to do a function to parse the string is very troublesome, so consider making a simple rule. Use a single string to describe a string that conforms to a grammar rule
Second, the use of regular expressions
1. Flowchart
Using the regular
# Re_find.pyimport Repa = Re.compile (R ' Imooc ', re. I) with open (' Imooc.txt ') as F:for line in F:if pa.match (line):p rint (line)
Output
Imooc JAVAIMOOC HTMLIMOOC Python IMOOC
Note: You can also use the match
Ma = Re.match (R ' Imooc ', ' Imooc python ', re. I) Print (Ma.group ())
Output
Imooc
Syntax for regular expressions (ref. http://www.runoob.com/python/python-reg-expressions.html)
Mode
| Mode |
Description |
| ^ |
Matches the beginning of a string |
| $ |
Matches the end of the string. |
| . |
Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed. |
| [...] |
Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K ' |
| [^...] |
Characters not in []: [^ABC] matches characters other than a,b,c. |
| Tel |
Matches 0 or more expressions. |
| Re+ |
Matches 1 or more expressions. |
| Re? |
Matches 0 or 1 fragments defined by a preceding regular expression, not greedy |
| re{N} |
|
| re{N,} |
Exact match n preceding expression. |
| re{N, m} |
Matches N to M times the fragment defined by the preceding regular expression, greedy way |
| a| B |
Match A or B |
| (RE) |
The G matches the expression in parentheses, and also represents a group |
| (? imx) |
The regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses. |
| (?-imx) |
The regular expression closes I, M, or x optional flag. Affects only the areas in parentheses. |
| (?: RE) |
A similar (...), but does not represent a group |
| (? imx:re) |
Use I, M, or x optional flag in parentheses |
| (?-imx:re) |
I, M, or x optional flags are not used in parentheses |
| (?#...) |
Comments. |
| (? = re) |
Forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter. |
| (?! Re) |
Forward negative qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string |
| (?> re) |
Match the standalone mode, eliminating backtracking. |
| \w |
Match alphanumeric and underline |
| \w |
Match non-alphanumeric and underline |
| \s |
Matches any whitespace character, equivalent to [\t\n\r\f]. |
| \s |
Match any non-null character |
| \d |
Match any number, equivalent to [0-9]. |
| \d |
Match any non-numeric |
| \a |
Match string start |
| \z |
Matches the end of the string, if there is a newline, matches only the ending string before the line break. C |
| \z |
Match string End |
| \g |
Matches the position where the last match was completed. |
| \b |
Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '. |
| \b |
Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '. |
| \ n, \ t, et. |
Matches a line break. Matches a tab character. such as |
| \1...\9 |
Matches the contents of the nth grouping. |
| \10 |
Matches the contents of the nth grouping, if it is matched. Otherwise, it refers to an expression of octal character code. |
Modifier
| modifier |
Description |
| Re. I |
Make the match case insensitive |
| Re. L |
Do localization identification (locale-aware) matching |
| Re. M |
Multiline match, affecting ^ and $ |
| Re. S |
Make. Match all characters, including line breaks |
| Re. U |
Resolves characters based on the Unicode character set. This sign affects \w, \w, \b, \b. |
| Re. X |
This flag is given by giving you a more flexible format so that you can write regular expressions much easier to understand. |
Iv. Examples of regular expressions
1. Character Matching
| Example |
Description |
| Python |
Match "Python". |
2. Character classes
| Example |
Description |
| [Pp]ython |
Match "python" or "python" |
| Rub[ye] |
Match "Ruby" or "Rube" |
| [Aeiou] |
Match any one of the letters within the brackets |
| [0-9] |
Match any number. Similar to [0123456789] |
| [A-z] |
Match any lowercase letter |
| [A-z] |
Match any uppercase letter |
| [A-za-z0-9] |
Match any letters and numbers |
| [^aeiou] |
All characters except the Aeiou letter |
| [^0-9] |
Matches characters except for numbers |
3. Special character classes
| Example |
Description |
| . |
Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '. |
| \d |
Matches a numeric character. equivalent to [0-9]. |
| \d |
Matches a non-numeric character. equivalent to [^0-9]. |
| \s |
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. |
| \s |
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]. |
| \w |
Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '. |
| \w |
Matches any non-word character. Equivalent to ' [^a-za-z0-9_] '. |
The greedy pattern and grouping in regular expressions
Greedy mode: ‘*‘ , ‘+‘ and the ‘?‘ default is greedy match, can be added later? into a non-greedy match.
Grouping: You can use () and \number to group matches
# Re_group.pyma = Re.match (R ' < (\w+) >.*</\1> ', ' <book>python</book><book>java</ Book> ') print (Ma.group ()) Ma = Re.match (R ' < (\w+) >.*?</\1> ', ' <book>python</book><book >Java</book> ') print (Ma.group ())
Output
<book>Python</book><book>Java</book><book>Python</book>
Other methods of the six and re modules
The match method is matched from the beginning. If you look in the entire string, using match is not a good fit.
23
The regular expression test of Python