Regular expression Regex
Motive: Manipulation of strings (text) is one of the main tasks of the computer
To find a type of string from text or large strings
In order to solve the above problem----"Regular expression
Definition: The essence is a string of characters and special symbols that are used to denote a certain class of strings of certain rules.
Advantages and use:
1. is an independent technology
2. Python---"Re" can be used in multiple programming languages
Re.findall (regex,string)
Function: Match string by regular expression
Parameter: Regex regular expression
String target strings
Return value: Returns a list of all matching items in the list
Regular meta characters
* Single character
Match rule: match the corresponding character
e.g. AB AB
In [4]: Re.findall (' ab ', ' ABCDEFABG ')
OUT[4]: [' AB ', ' AB ']
* Match a single character
Metacharacters:.
Match rule: match any character except ' \ n '
e.g. F.O---> foo FAO FBO
In [6]: Re.findall (' f.o ', ' FAOSFAAFBO ')
OUT[6]: [' FAO ', ' FBO ']
* matches the starting position of the string
Metacharacters: ^
Match rule: ^ position must be the starting position of the string, usually descriptor with other meta-characters
e.g. ^ABC abcdef
In [8]: Re.findall (' ^abc ', ' abcdefgh ')
OUT[8]: [' abc ']
* Match string end
Metacharacters: $
Match rule: match the end position of the string
e.g. py$ a.py
In [All]: Re.findall (' py$ ', ' a.py ')
OUT[11]: [' py ']
* Match Duplicates
Metacharacters: *
Match rule: Matches the preceding occurrence of the regular expression 0 or more times
e.g. ab* a ab abbbbb abbbbbb
in [+]: Re.findall (' ab* ', ' aabbabbljlk ')
OUT[17]: [' A ', ' ABB ', ' ABB ']
* Match Duplicates
Metacharacters: +
Match rule: Matches the preceding occurrence of the regular expression 1 or more times
n [+]: Re.findall (' ab+ ', ' aabbabbljlk ')
OUT[18]: [' ABB ', ' ABB ']
* Match Duplicates
Metacharacters:?
Match rule: Matches the previous occurrence of the regular expression 0 or 1 times
In []: Re.findall (' ab ', ' aabbabbljlk ')
OUT[20]: [' A ', ' ab ', ' AB ']
* Match Duplicates
Metacharacters: {n}
Match rule: Match specified number of repetitions
In []: Re.findall (' ab{2} ', ' Aabbabbljlk ')
OUT[22]: [' ABB ', ' ABB ']
* Match Duplicates
Metacharacters: {m,n}
Matching rules: match repeats M to n times
in [+]: Re.findall (' ab{2,5} ', ' Aabbabbbbbbljlk ')
OUT[26]: [' ABB ', ' abbbbb ']
* Character Set Matching
Metacharacters: [ABCD]
Match rule: matches any character in a character set
* Match the character range
Metacharacters: [0-9] [A-z] [a-z]
Matching rules: Any character in the match interval can be written together, and other character sets can be added.
e.g. [_3-9a-z]
in [+]: Re.findall (' [_0-9a-za-z]+ ', ' Hello World ' Hello py_2 ')
OUT[30]: [' Hello ', ' world ', ' hello ', ' py_2 ']
* Set counter
meta-characters [^ ....]
Match rule: match any one of the characters that are no longer in the collection
e.g. [^0-9] A G & $
in [+]: Re.findall (' [^0-9]+ ', ' Hello World ' Hello py_2 ')
OUT[31]: [' Hello World Hello Py_ ']
Metacharacters: \d \d
Match rule: Any numeric character any one non-numeric character
in [+]: Re.findall (' \d{8} ', ' 12345678 ')
OUT[32]: [' 12345678 ']
Metacharacters: \w \w
Match rule: any one of the alphanumeric characters underlined any special character
[_0-9a-za-z] [^_0-9a-za-z]
Metacharacters: \s \s
Match rule: any null character matches any non-null character
[\n\r\t\0]
in [+]: Re.findall (' \s\s+ ', ' Hello World nihao China ')
OUT[41]: [' World ', ' Nihao ', ' China ']
Metacharacters: \a \z
Match rule: Matches the beginning of the string ^ matches the end of the string at the location $
In [Re.findall]: (' \afoo\z ', ' foo ')
OUT[52]: [' foo ']
metacharacters: \b \b
Match rule: match word boundary position match non-word boundary position
Word boundary: The position of a numeric letter underline and other character junctions is considered a word boundary
In [the]: Re.findall (R ' foo\b ', ' foo Food foot ')
OUT[59]: [' foo ']
In [MAX]: Re.findall (R ' foo\b ', ' foo Food foot ')
OUT[60]: [' foo ', ' foo ']
Metacharacters: |
Matching rules: Connecting multiple regular expression formation or relationships
in [+]: Re.findall (' Abc|bcd ', ' abcdefbcdef ')
OUT[64]: [' abc ', ' BCD ']
Escape character
* $ ? + \d \s
1. There are many special characters Fu Weihuan characters in regular expressions, which are escaped if they need to match to a special character when setting a match.
e.g \----> \ \---> \* \d----> \\d
2. When using a programming language, regular expressions are often passed in as strings
and the character string of a programming language is escaped.
Python str-----"raw STR considers the string to be original and not escaped
\\* \*
Matches a single character: normal character. \d \d \w \w \s \s [...] [^...]
Match location: ^ $ \a \z \b \b
Match repetitions: * +? {n} {m,n}
Other: |
Greed and non-greed
Greedy mode: When the number of repetitions is not sure, the regular expression always matches as many backwards as possible
+ {M,n}
Non-greedy mode: Add after repeating meta-characters?
In [94]: Re.findall (' ab*? ', ' Abbbbbbba ')
OUT[94]: [' A ', ' a ']
In [up]: Re.findall (' ab+? ', ' Abbbbbbba ')
OUT[95]: [' AB ']
Regular expression subgroups
In a regular expression, you can use () to take part of a regular expression as a subgroup of the regular expression
Regex (AB) *cdef
What a subgroup can do
* Subgroups represent an internal whole that can change the scope of repeated meta-characters
* Many programming language functions can extract the contents of subgroups separately
* More convenient to use and call
In principle, there can be many subgroups in a regular expression. From the outside to the inside, from left to there are respectively called the first subgroup, the second sub-group ... Sub-groups do not cross
Capturing groups
Child group naming (? P<NAME>ABCD)
Child Group calls (? P=name)
(? P<DOG>AB) cdef (? P=dog)
Non-capturing group
Your time:
Matches a password of 8-10 bits in length, must start with a letter, and a digital letter underline
In [2]: Re.findall (R ' ^[a-za-z]\w{7,9}$ ', ' abc123_a ')
OUT[2]: [' abc123_a ']
Match ID number
In [6]: Re.search (R ' \d{17} (\d|x) ', ' 123123123123123123 '). Group ()
OUT[6]: ' 123123123123123123 '
Matches a word that begins with an uppercase letter in a text
In []: Re.findall (R ' \b[a-z]\w*\b ', data)
OUT[14]: [' Python ', ' Hello ', ' World ']
Python----> Regex
module RE
Compile (pattern, flags=0)
Function: Generate a regular Expression object
Parameter: pattern: Regular Expression
Flags: Extended flag bit, default = 0 means no expansion
Return value: Regular Expression object
obj = compile (' abc ')
The functions in the following * * * can be called by re directly and compile objects
**************************************************
Re.findall (Pattern, string, flags=0)
Function: Match target string according to regular expression
Parameter: pattern Regular expression
String target strings
Flags: Regular extension flag bit
Return value: Everything matched to is returned as a list
If there is a grouping, only the content that the subgroup can match is returned.
Obj.findall (String=none, pos=0,endpos=99999)
Function: Match target string according to regular expression
Parameters: String Target strings
POS: Match the starting position of the target string
Endpos: Match end position of target string
Return value: Everything matched to is returned as a list
If there is a grouping, only the content that the subgroup can match is returned.
Finditer ()
Function: With FindAll
Parameter: Same findall
Return value: Returns an iterative object that gets each value for the match obj
*match object: Finditer match Fullmatch Search
These functions give the result of a regular match to a match object, which makes it easy to do something specific.
Fullmatch ()
Function: Match a string exactly with a regular expression
Parameters: Target String
Return value: Returns matching to the match object if no match is returned to none
Match ()
Function: Matches the beginning of a string
Parameters: Target String
return value I: Return match object if matched to content returns none
Search ()
Function: Matches the first matched string
Parameters: Target String
Return value: Return match object if matched to content returns none
Split ()
Function: Cut the string by regular expression
Parameters: Target String
Return value: Put the cut string into the list
Sub (Re_str,string,max)
Function: Replace the part of the regular expression with the specified string
Parameter: re_str string to replace
String: Target string
Max: replace up to a few places
Return value: The replaced string
SUBN ()
Function: Replace the part of the regular expression with the specified string
Parameter: re_str string to replace
String: Target string
Max: replace up to a few places
Return value: The return value is a two tuple, the first item is the replaced string, and the second is the actual replacement of several places
****************************************************
Compile returns the properties of an object
Flags: Regular expressions represent the shaping of bits
Pattern: Regular Expression
Groupindex: Returns the name of the capturing group as the key, and the first group is the Dictionary of Merit
Groups: How many subgroups are in a regular expression
Match Search Fullmatch Finditer
Match object properties and methods
Property:
POS: Matches the starting position of the target string
Endpos: Match end position of target string
Lastgroup: Gets the name of the last child group, or none if no name is given
Lastindex: Perhaps the last subgroup is the first child group
Re:match regular expressions used to match
Regs: The whole of the regular expression and the part that each subgroup matches
String:match Matching target string
Method:
Start ()
Gets the start position of the match in the string
End ()
Gets the end position of the match in the string (next to the end character subscript)
Span ()
Get the match to the start and end position of the content in the string
Group (N)
Function: Gets the content to which the match object matches
Parameter: N defaults to 0 to indicate the entire regular match to the content
When a positive integer is attached to n, it means to get the nth sub-group match
Return value: Returns the match to the string
Groups ()
Function: Gets the content that all subgroups match to
Groupdict ()
Function: The name of the capturing group and the matching content form a key-value pair relationship
Re.compile re.findall re.match re.search .....
' A ',
' ASCII ',
' S ' let. Can match line breaks
' Dotall ',
' I ', ignoring uppercase and lowercase
' IGNORECASE '
' L ',
' LOCALE ',
' M ', made for ^ $ so that it can match the beginning of each line end
' MULTILINE '
' T ',
' TEMPLATE ',
' U ',
' UNICODE '
' X ' allows you to add a comment that starts with #
' VERBOSE ',
When multiple flags are used simultaneously, the middle is split with a vertical line
Re. I | Re. S
Python re regular