What is a regular expression?
Regular expressions are not just Python patents, but there are regular expressions in almost every other language. Just like Bash has a regular expression command--grep, and personally feel that the regular expression in Python is similar to
Grep-o
Regular expressions are powerful tools for working with strings , although they may not be efficient, but powerful!
The regular expression sets a series of grammars, such as ". "represents any single character, \d represents a single number ... By combining these rules we can form a pattern! Any string that conforms to this pattern, we think it is matched, otherwise it is illegal to represent this string!
Syntax for regular Expressions:
Single character |
Specific instructions |
. |
Match any character (except for line break \ n) |
[] |
Match character set, [0-1] for numbers, [A-z] for lowercase letters |
\ |
Escape character, change the original meaning of the subsequent character |
[^] |
Non - |
\d |
Same as [0-9] function, indicating numbers |
\d |
Non-digital, i.e. [^\d] |
\s |
White space character is:[< space >\t\r\f\n\v] |
\s |
Non-whitespace characters |
\w |
Word character [a-za-z0-9] |
\w |
Non-word character [^\w] |
The table above is a single character. When we want to match a letter, we can write this: [A-z]. But what if we were to match two characters?
You may think of it this way: [A-z][a-z] good, in this manner also can. But what if we're going to match the length of 10 or even hundreds of or non-qualifying characters? Then we're going to use a number of words.
Quantity words |
Description |
* |
Match the preceding character any time (0 or unlimited) |
+ |
Match the preceding character one or more times |
? |
Match the preceding character 0 or 1 times |
{M,n} |
Matches the preceding character minimum m times, up to N times |
When we want to match a string that starts with a, we need to match the boundary:
^ |
Matches the beginning of the string, matching the beginning of each line in multiline mode |
$ |
Matches the end of the string, matching the end of each line in multiline mode |
\a |
Match string start only |
\z |
Match string End only |
\b |
Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '. |
\b |
Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never ' |
Logical characters and groupings:
| |
Similar to or |
() |
Matches an expression within parentheses, and also represents a group |
(? Pname..) |
Group, specifying an additional alias in addition to the original number |
\ n |
The character that references the nth grouped expression to |
(? P=name) |
The character that the grouping expression that references the alias name matches to |
In addition, because Python uses \ as an escape character, and the regular expression needs \ As an escape character, if you need to match the character "\" in the text, you need 4 backslashes "\\\\": the first two and the last two are used to escape the backslash in the programming language respectively. To avoid this complex situation, Python provides us with the R prefix (native string)! All the strings in R ' pattern ' pattern mean the original meaning, that is, \ is \, does not represent the escape character!
Python provides re
modules that contain all the functions of regular expressions, so if we want to use regular expressions, we need:
Import RE
Python regular expressions commonly used in 5 kinds of operations:
1 Re.metch ("pattern", "string") : The method matches from the head and returns a match object if the
head conforms to pattern Otherwise, none is returned)
ImportReret=re.match ('[A-z]','1SHDKFASDF')Print(ret) Result: NoneImportReret=re.match ('[A-z]','SHDKFASDF')Print(ret) Results:<_sre. Sre_match object; span= (0, 1), match='s'>
This method is commonly used to determine whether a match:
Import re
input_str=' ASHDFJHAKDSF '
If Re.match (' [A-z] ', input_str):
Print (' ok!! ')
Else
print (' not found!! ')
2 re.search (' Pattern ', ' string '): matches the entire string until a match is found, and if a match is found, a match object is returned and none is returned.
Print(Re.search ('[0-9]{3}','sdfkj123')) Results:<_sre. Sre_match object; Span= (5, 8), match='123'>#This is generally usedifRe.search ('[0-9]{3}','sdfkj123'): Print('ok!!')Else: Print('Not found!!')
3 Re.split (): slicing a string
Some students will ask, we do not have str.split () or list.split ... method to slice it, why do you use it?
Take a look at the example:
A='a s D G'Print(A.split (' ') Results: ['a','s','D',"',"',"','g']#when there are multiple contiguous spaces in a string, the result of slicing is so unsatisfactory! a='a s D G'Print(Re.split ('\s+', a)) Results: ['a','s','D','g']
4 Re.findall (): Finds all matches to the character and returns it as a list.
Ret=re.findall ('a','askdjflas')print( Ret
Result: [' A ', ' a ']
5 re.sub (' Pattern ', ' Replace with ', ' string ', count=): Replace the matched character with the number of times you can specify.
Ret=re.findall ('a','askdjflas') Print (ret) results: 1skdjfl1s
Application examples of regular expressions:
Match phone Number:
info='my phone is 18766666666'ret=re.search ('1[358]\d{9} ' , info) Print (Ret.group ()) Results:18766666666
Match IP v4 Address:
info='my phone is 18766666666,my IP is 2.187.5.6'ret=re.search ('(25 [0-5]|2[0-4]\d| [0-1]?\d?\d) (\. ( 25[0-5]|2[0-4]\d| [0-1]?\d?\d)] {3}', info)print(Ret.group ()) Result:2.187.5.6
Regular expression of Python basics