Day4 regular expression, regular expression 4 digits
Syntax:
Regular Expressions are functions that process strings. We also have many such formulas in Excel functions. Because I have learned some Excel statements, let's take a look at the different methods.
Import re # import re module to process the regular expression Module
P = re. compile ("^ [0-9]") # generate the regular object to be matched. ^ indicates the regular object to be matched from the beginning. [0-9] indicates matching any number ranging from 0 to 9, therefore, this indicates matching the passed string. If the first character at the beginning of the string is a number, it indicates matching.
M = p. match ("12344abc") # match the string based on the regular object generated above. If the match succeeds, this m will have a value; otherwise, m will be None.
Pirnt (m. group () # m. group () returns the matching result. Here it is 1 because it matches the character 1.
The preceding 2nd and 3rd rows can also be merged into one row for writing:
M =p.match(
"^[0-9]"
,
'14534Abc'
)
The results are the same. The difference is that the first method is to compile the format to be matched (parse the matching formula) in advance ), in this way, you do not need to compile the matching format when matching. The 2nd abbreviations are used to compile the matching formula each time. Therefore, if you need to match all rows starting with a number from a file with 5 million rows, we recommend that you compile the regular expression first and then match it. This will speed up.
Character:
. Match any character except linefeed
\ W matches letters, numbers, underscores, and Chinese characters
\ S matches any blank space character
\ D matching number
\ B matches the start or end of a word
^ Match the start of a string
$ End of matching string
Times:
* Repeated zero or more times
+ Repeat once or more times
? Zero or one repetition
{N} repeated n times
{N,} repeat n times or more times
{N, m} repeat n to m times
Methods In the re module:
1. match (pattern, string, flags = 0)
Def match (pattern, string, flags = 0 ):
# Match: matching starts from the starting position. If the matching succeeds, an object is returned. If the matching fails, None is returned.
"Try to apply the pattern at the start of the string, returning
A match object, or None if no match was found ."""
Return _ compile (pattern, flags). match (string)
Match (pattern, string, flags = 0)
# Pattern: regular model
# String: string to be matched
# Falgs: matching mode
2. fullmatch (pattern, string, flags = 0)
Def fullmatch (pattern, string, flags = 0 ):
"Try to apply the pattern to all of the string, returning
A match object, or None if no match was found ."""
Return _ compile (pattern, flags). fullmatch (string)
3. search (pattern, string, flags = 0)
Def search (pattern, string, flags = 0 ):
# Search: browses the entire string to match the first one. If no matching is successful, None is returned.
"Scan through string looking for a match to the pattern, returning
A match object, or None if no match was found ."""
Return _ compile (pattern, flags). search (string)
4. sub (pattern, repl, string, count = 0, flags = 0)
Def sub (pattern, repl, string, count = 0, flags = 0 ):
# Sub: Replace the matched string at the specified position
"Return the string obtained by replacing the leftmost
Non-overlapping occurrences of the pattern in string by
Replacement repl. repl can be either a string or a callable;
If a string, backslash escapes in it are processed. If it is
A callable, it's passed the match object and must return
A replacement string to be used ."""
Return _ compile (pattern, flags). sub (repl, string, count)
Sub is an element in the replacement string, which can be specified to replace several.
Sub (pattern, repl, string, count = 0, flags = 0)
# Pattern: regular model
# Repl: string or executable object to be replaced
# String: string to be matched
# Count: Number of matched items
# Flags: matching mode
In the following example, replace the first two digits in the string with "|". The example is as follows:
>>> M = re. sub ("[0-9]", "| ","Alex1is2sb6dese8", Count = 2)
>>> M
'Alex | is | sb6dese8'
5. subn (pattern, repl, string, count = 0, flags = 0)
Def subn (pattern, repl, string, count = 0, flags = 0 ):
"" Return a 2-tuple containing (new_string, number ).
New_string is the string obtained by replacing the leftmost
Non-overlapping occurrences of the pattern in the source
String by the replacement repl. number is the number
Substitutions that were made. repl can be either a string or
Callable; if a string, backslash escapes in it are processed.
If it is a callable, it's passed the match object and must
Return a replacement string to be used ."""
Return _ compile (pattern, flags). subn (repl, string, count)
6. split (pattern, string, maxsplit = 0, flags = 0)
Def split (pattern, string, maxsplit = 0, flags = 0 ):
# Split: splits strings based on regular match.
"Split the source string by the occurrences of the pattern,
Returning a list containing the resulting substrings. If
Capturing parentheses are used in pattern, then the text of all
Groups in the pattern are also returned as part of the resulting
List. If maxsplit is nonzero, at most maxsplit splits occur,
And the remainder of the string is returned as the final element
Of the list ."""
Return _ compile (pattern, flags). split (string, maxsplit)
Split (pattern, string, maxsplit = 0, flags = 0)
# Pattern: regular model
# String: string to be matched
# Maxsplit: specifies the number of shards.
# Flags: matching mode
Examples are as follows. The following examples are separated by numbers and strings are separated into a list, as shown below:
>>> Import re
>>> M = re. split ("[0-9]", "alex1rain2jack3helen cancel8 ")
>>> M
['Alex ', 'rain', 'jack', 'helen rachel', '']
>>> M = re. split ("[0-9]", "alex1is2sb4heeh ")
>>> M
['Alex ', 'is', 'SB ', 'heeh']
7. findall (pattern, string, flags = 0)
Def findall (pattern, string, flags = 0 ):
# Findall: get the list of non-repeated matches. If there is a group, it is returned in the form of a list and each match is a string. If there are multiple groups in the model, it is returned in the form of a list, and each match is the ancestor;
# Null matches will also be included in the results
"" Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
A list of groups; this will be a list of tuples if the pattern
Has more than one group.
Empty matches are encoded in the result ."""
Return _ compile (pattern, flags). findall (string)
Findall (pattern, string, flags) is used to obtain the format of the specified Regular Expression Model in the string and return a list. The following example shows how to obtain all the numbers in the string and return a list:
>>> M = re. findall ("[0-9]", "alex11rain2jack3helan rache8 ")
>>> M
['1', '1', '2', '3', '8'] (1)
>>> M = re. findall ("[0-9] +", "alex11rain2jack3helan rache8 ")
>>> M
['11', '2', '3', '8'] (2)
In the above Code, if two numbers are combined at (1), only one is obtained. In (2), "+" gets one or more.
8. finditer (pattern, string, flags = 0)
Def finditer (pattern, string, flags = 0 ):
"" Return an iterator over all non-overlapping matches in
String. For each match, the iterator returns a match object.
Empty matches are encoded in the result ."""
Return _ compile (pattern, flags). finditer (string)
9. compile (pattern, flags = 0)
Def compile (pattern, flags = 0 ):
"Compile a regular expression pattern, returning a pattern object ."
Return _ compile (pattern, flags)
10. purge ()
Def purge ():
"Clear the regular expression caches"
_ Cache. clear ()
_ Cache_repl.clear ()
11. temlate (pattern, flags = 0)
Def template (pattern, flags = 0 ):
"Compile a template pattern, returning a pattern object"
Return _ compile (pattern, flags | T)
12. escape (pattern)
Def escape (pattern ):
"""
Escape all the characters in pattern character t ASCII letters, numbers and '_'.
"""
Character class
Instance
Regular Expression (pattern) |
Description (describe) |
[Pp] ython |
Match "Python" or "python" |
Rub [ye] |
Match "ruby" or "rube" |
[Aeiou] |
Match any letter in the brackets |
[0-9] |
Match any number. Similar to [0123456789] |
[A-z] |
Match any lowercase letter |
A-Z |
Match any uppercase letter |
A-zA-Z0-9 |
Match any letter or number |
[^ Aeiou] |
Match All characters except aeiou letters |
[^ 0-9] |
Match characters other than numbers |
Special character class
Regular Expression (pattern) |
Description (describe) |
. |
Match anySingle Character. To match any character including "\ n", use a pattern like "[. \ n ]". |
\ D |
Match a numeric character. Equivalent to [0-9] |
\ D |
Match a non-numeric character. Equivalent to [^ 0-9] |
\ S |
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v] |
\ S |
Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v] |
\ W |
Match any word characters that contain underscores. Equivalent to [A-Za-z0-9 _] |
\ W |
Match any non-word characters. Equivalent to [^ A-Za-z0-9 _] |
The difference between re. match and re. search:
Re. match only matches the start of the string. If the start of the string does not match the regular expression, the match fails, and the function returns None. re. search matches the entire string and finds a match.
Several common Regular Expressions:
1. Match the mobile phone number
Phone_str = "hey my name is Jersey, and my phone number is13651054607, please call me if you are pretty!"
phone_str2
=
"hey my name is alex, and my phone number is 18651054604, please call me if you are pretty!"
M = re. search ("(1) [358] \ d {9})", phone_str2)
If m:
Print (m. group ())
2. Match IPv4
Ip_addr = "inet 192.168.60.223 netmask oxfffff00 broadcast 192.168.60.255"
M = re. search ("\ d {1, 3}. \ d {1, 3}. \ d {1, 3}. \ d {1, 3}", ip_addr)
Print (m. group ())
3. group matching address
ContactInfo = "Oldboy School, Beijing Changping Shahe: 010-8343245"
Match = re. search (R' (\ w +), (\ w +) :( \ S +) ', contactInfo) # group (method 1)
Match. group (1)
Match. group (2)
Match. group (3)
match
=
re.search(r
'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)'
, contactInfo)
(Method 2)
>>> match.group('last')
'Doe'
>>> match.group('first')
'John'
>>> match.group('phone')
'555-1212'
4. email matching
email
=
"alex.li@126.com http://www.oldboyedu.com"
m
=
re.search(r
"[0-9.a-z]{0,26}@[0-9.a-z]{0,20}.[0-9a-z]{0,8}"
, email)
print
(m.group())
Let's take a look at an instance and define a regular expression:
Import re
# Import the re module to process the regular expression format of strings
M = re. match ("abc", "abcdef ")
Print (m)
M = re. match ("abc", "abcdef ")
Print (m. group ())
M = re. match ("abc", "bcdef ")
Print (m)
The running result is as follows:
<_ Sre. SRE_Match object; span = (0, 3), match = 'abc'>
Abc
None
In the code above, we define a regular expression in the format of m, and then use match () for search matching. match () is a regular object that is found from the beginning, if no result is found, None is returned. you can use group () to view the content found.The match () function starts from scratch..
M = re. match ("[0-9] {0, 10}", "15d6afdgd ")
If m:
Print (m. group ())
In pattern, [0-9] {0, 10} matches 0-9, and {0, 10} matches 0 to 10 times.
Match All numbers in the string, findall (pattern, string, flags ):
M = re. findall ("[0-9] {1, 10}", "15d6afd2334d2dgd3 ")
Print (m)
The running result is as follows:
['15', '6', '123', '2', '3']
The above code matches the numbers in the string and returns a list. Below we will match all the letters in the string:
M = re. findall ("[A-Za-z] {1, 10}", "15d6afd2334d2dgd3 ")
Print (m)
Run the following command:
['D', 'afd', 'D', 'dgd ']
The above code matches all the characters in the string [A-Za-z].
Vertices (.) match anySingle Character. To match any character including "\ n", use the following example: [. \ n:
M = re. findall (".", "15d6afd2334d2dgd3 ")
Print (m)
['1', '5', 'D', '6', 'A', 'F', 'D', '2', '3 ', '3', '4', 'D', '2', 'D', 'G', 'D', '3']
In the above Code, we use dot (.) to match anySingle Character, We get a list of single characters, because the dot (.) matches any single character.
Next we will use the dot Star (. *) for matching. We know that * matches any zero or multiple characters. The Code is as follows:
M = re. findall (". *", "15d6afd2334d2dgd3 ")
Print (m)
The running result is as follows:
['15d6afd2334d2dgd3 ', '']
Because of the point (.) is any matching string. * asterisks are the number of matching times (. *) indicates matching any zero or multiple strings (except "\ n ).
Next we use Vertex plus (. +) We know the point (.) is the Matching content, and the plus sign (+) is the number of matching times. The matching is a single or multiple times, and the vertex is added (. +) is a string that matches one or more times. The Code is as follows:
M = re. findall (". +", "15d6afd2334d2dgd3 ")
Print (m)
Run the following command:
['15d6afd2334d2dgd3 ']
Next let's use the question (.?) Let's take a look, because? Yes indicates zero or one time. Check the code and running result:
M = re. findall (".? "," 15d6afd2334d2dgd3 ")
Print (m)
The running result is as follows:
['1', '5', 'D', '6', 'A', 'F', 'D', '2', '3 ', '3', '4', 'D', '2', 'D', 'G', 'D', '3', '']
Question mark (?) It can be used to appear at least once, because at most once, the plus sign (+) represents at least once, and the asterisk (*) represents at least 0 times.
^ Start with and end with $.