A regular expression is a powerful tool for handling strings, and it is not part of Python.
There are also the concepts of regular expressions in other programming languages, which differ only in the number of syntax implementations supported by different programming languages.
It has its own unique syntax and a separate processing engine, and the syntax of regular expressions is the same in languages that provide regular expressions.
The following illustration shows a process that uses regular expressions to match:
1.1 Introduction
Regular expressions are not part of Python. Regular expressions are powerful tools for handling strings, with their own unique syntax and an independent processing engine that may not be as efficient as the STR-band approach, but powerful. Thanks to this, in a language that provides regular expressions, the syntax of regular expressions is the same, except that the number of grammars supported by different programming languages is different, but don't worry, unsupported syntax is usually a less common part.
A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern. Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. The RE module enables the Python language to have full regular expression functionality.
1.2 The various uses to know
A pattern string uses a special syntax to represent a regular expression:
Letters and numbers represent themselves. The letters and numbers in a regular expression pattern match the same string. Most letters and numbers have different meanings when they are added with a backslash. Punctuation matches itself only when they are escaped, otherwise they represent a special meaning. The backslash itself needs to be escaped with a backslash.
Because regular expressions usually contain backslashes, you might want to use the original string to represent them. Pattern elements (such as R '/t ', equivalent to '//t ') match the corresponding special characters.
The following table lists the special elements in the regular expression pattern syntax. If you use patterns and provide optional flag parameters, the meaning of some schema elements will change.
Of course, these uses are many, will give out often use of usage, try to understand more.
Mode
Mode |
Description |
^ |
Match the beginning of a string |
$ |
Matches the end of a string. |
. |
Matches any character, except for line breaks, when re. When the Dotall tag is specified, you can match any character that includes a line feed. |
[...] |
Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K ' |
[^...] |
Characters not in []: [^ABC] matches characters other than a,b,c. |
re* |
Matches 0 or more expressions. |
Re+ |
Matches 1 or more expressions. |
Re? |
Matches 0 or 1 fragments defined by the preceding regular expression, not in a greedy way |
re{N} |
|
re{N,} |
Exact match n preceding expression. |
re{N, m} |
Matches N to m times by fragments defined by the preceding regular expression, greedy way |
a| B |
Match A or B |
(RE) |
G matches the expression in parentheses and also represents a group |
(? imx) |
A regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses. |
(?-imx) |
The regular expression closes the I, M, or x optional flag. Affects only the areas in parentheses. |
(?: RE) |
Similar (...), but does not represent a group |
(? imx:re) |
Use the I, M, or x optional flag in parentheses |
(?-imx:re) |
Do not use the I, M, or x optional flag in parentheses |
(?#...) |
Comments. |
(? = re) |
The forward affirmative-defined character. If a regular expression is included to ... Indicates that success occurs when the current location succeeds, or fails. But once the expression has been tried, the matching engine does not improve at all, and the rest of the pattern tries to define the right side of the symbol. |
(?! Re) |
Forward negative definition character. In contrast to affirmative-defined characters; successful when the containing expression cannot match the current position of the string |
(?> re) |
Matching independent mode, omitting backtracking. |
\w |
Match letter Numbers |
\w |
Match non-alpha-numeric |
\s |
Matches any whitespace character, equivalent to [\t\n\r\f]. |
\s |
Match any non-null character |
\d |
Matches any number, equivalent to [0-9]. |
\d |
Match any non-numeric |
\a |
Match string start |
\z |
Matches the end of the string and, if there is a newline, matches only the ending string before the newline. C |
\z |
Match string End |
\g |
Matches the location of the final match completion. |
\b |
Matches a word boundary, which is the position between the word and the space. For example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '. |
\b |
Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '. |
\ n, t, et. |
Matches a line feed character. Matches a tab character. Wait |
\1...\9 |
A subexpression that matches the nth grouping. |
\10 |
A subexpression that matches the nth grouping if it is matched. Otherwise, the expression of the octal character code is indicated. |
Character class
instance |
Description |
[Pp]ython |
Match "python" or "python" |
Rub[ye] |
Match "Ruby" or "Rube" |
[Aeiou] |
Match any one of the letters in brackets |
[0-9] |
matches any number. Similar to [0123456789] |
[A-z] |
Match any lowercase letter |
[A-z] |
Match any uppercase letters |
[A-za-z0-9] |
Match any letter or number |
[^aeiou] |
All characters except the Aeiou letter |
[^0-9] |
Match characters except numbers |
Special character Classes
instance |
Description |
. |
Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '. |
\d |
Matches a numeric character. equivalent to [0-9]. |
\d |
Matches a non-numeric character. equivalent to [^0-9]. |
\s |
Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. |
\s |
Matches any non-white-space character. equivalent to [^ \f\n\r\t\v]. |
\w |
Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '. |
\w |
Matches any non word character. Equivalent to ' [^a-za-z0-9_] '. |
1.3re.match function
Re.match attempts to match a pattern from the starting position of the string, and match () returns none if the starting position match succeeds.
Re.match (pattern, string, flags = 0)
Pattern Regular Expressions
string-Matching strings
Flags flag bits, which are used to control the matching method, which is described below
Directly on the program:
Import string,re
r = "ABC" #正则表达式
if Re.match (R, "abc"): #匹配
print ' Done '
else:
Results:
Done
Can be based on the use of the above table, more practice:
Import string,re
r = "A.C" #正则表达式. matches any character, except for line breaks, when re. When the Dotall tag is specified, you can match any character that includes a line feed.
if Re.match (R, "abc"):
print Re.match (R, "abc")
print ' Done '
else:
Results:
<_sre. Sre_match Object at 0x01dd6158>
Done
Note This is not a string that shows a matching success, Re.match () returns an object and none of the unsuccessful returns.
We can get matching expressions by using Group (NUM) or groups () to match object functions.
Matching Object Method |
Description |
Group (num=0) |
string of the entire expression that matches, group () can enter more than one group number at a time, in which case it will return a tuple containing the corresponding values of those groups. |
Groups () |
Returns a tuple containing all the group strings, from 1 to the included group number. |
Program:
Import string,re
r = "A.C"
if Re.match (R, "abc"): Line
= Re.match (R, "abc")
print line.group ()
Else :
Results:
Abc
1.3re.search function
Re.search () scans the entire string and returns the first successful match
Re.search (Pattern, string, flags=0)
Pattern Regular Expressions
string-Matching strings
Flags flag bits, used to control the matching method
As with Re.match (), the matching successful Re.search method returns a matching object, otherwise none is returned.
Directly on the program:
Import string,re
r = "abc"
s = ' aacawcabc '
if Re.search (r,s): Line
= Re.search (r,s)
Results:
Abc
Attention:
The difference between Re.match () and Re.search ():
Re.match matches only the beginning of a string, if the string does not start with a regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.
1.4re.sub function
The Re.sub () function replaces matches.
Re.sub (Pattern,repl,string,max = 0)
Pattern Regular Expressions
REPL Replacement Items
string-Matching strings
The maximum number of count substitution defaults is 0 to replace all matches
The returned string is substituted in the string with the left-most-repeated match in the RE. If the pattern is not found, the character is returned unchanged.
Program:
Import string,re Pattern
= ' \d '
repl = "!"
s = ' ABCDEFG ' line
= Re.sub (pattern,repl,s)
Results:
!!!!!!!!! Abcdefg
1.5 Regular expression modifiers-optional flags
Let's say what is the flag bit:
A regular expression can contain some optional flag modifiers to control the matching pattern. Modifiers are specified as an optional flag. Multiple flags can be specified by a bitwise OR (|). such as Re. I | Re. M is set to the I and M flags:
modifiers |
Description |
Re. I |
Make matching not sensitive to case |
Re. L |
Do localization recognition (locale-aware) matching |
Re. M |
Multiple lines matching, affecting ^ and $ |
Re. S |
Make. Matches all characters, including line wraps |
Re. U |
Resolves characters based on the Unicode character set. This sign affects \w, \w, \b, \b. |
Re. X |
The sign gives you a more flexible format so that you can write regular expressions easier to understand. |
Program:
Import string,re Pattern
= ' [Aa][bb][cc][dd] '
s = ' AbCd '
if Re.match (pattern,s): Line
= Re.match ( Pattern,s)
Results:
Abcd
The above program can be implemented by selecting a flag bit:
Import string,re Pattern
= ' ABCD '
s = ' abcd '
if Re.match (pattern,s,re. I): Line
= Re.match (pattern,s,re. I)
Results
Abcd
1.6re.compile function
The general step in using the RE is to use the Re.compile () function to compile the string form of the regular expression into a pattern instance, and then to use a pattern instance to process the text and get the matching result (a match instance) and finally use the match instance to get the information. Do other things.
Program:
Import string,re Pattern
= re.compile (' \d+ ')
s = ' AABBCCDD '
if Pattern.match (s): line
= Pattern.match (s)
print Line.group ()
Results:
11223344