Regular expressions: Regular expressions are a logical formula for string manipulation, which is a "rule string" that is used to express a filter logic for a string by using a predefined set of specific characters and combinations of those specific characters.
Need to introduce a built-in module re
Cons: Inefficient, able to use string methods to do as much as possible using the string method
Advantages: Simple Coding
Atomic meaning
Any one of the characters in \d 0-9
\d Exclusion 0-9
\s \t,\n,\r, one of the characters in the space
\s Exclude \s
\w Digital Letter Underline
\w Exclude \w
[] A custom atomic table
[^] Exclude a custom atomic table
^ Beginning of line,
¥ End of Line
. Represents any character other than \ n,
\b Word Boundary
\b Non-word boundary
\a String Header
\z End of string
-Represents a domain, a range, such as any of 1-9, 1, or 9
-----------------------------------------------
Meta-character meaning
{} Indicates the number of repeated atoms
{m} represents repeated atoms m times
{m,n} means duplicate atoms at least m times, up to N times
{m,} means duplicate atoms at least m times, no upper limit
() Change the priority, take the child element
* Repeat for at least 0 times, meaning haunt appears ok, but there is greed,
+ appears at least once, there is greed
? Occurs 0 or 1 times
*?,+? Eliminate greed
| Or
----------------------------------------------------------------------
Pattern modifier
Modifier meaning
Re. S make . match All characters
Re. L Local Identification
Re. U parsing characters based on Unicode characters, which affects \b,\b,\w,\w
RE.L Case insensitive
Re. M Multi-line matching
------------------------------------------------------------------------
The pattern string must use the original string, which is the string that begins with R. R ' www '
Strictly case-sensitive
If there is {,},[,],-,?, *,|^,$ in the regular expression, then you want to escape the normal character.
A regular common function method
Re.match () Start match at beginning of string, match only beginning
Re.search () matches from the beginning, but not only the beginning of the match, but also the middle one.
Re.findall () finds all compliant and returns as a list
Re.split () split according to pattern
RE.SUB,RE.SUBN: Matches strings in the target string with regular expression rules, and then replaces them with the specified string. You can specify the number of times to replace
If not specified, all matching strings are replaced
The former returns a substituted string, which returns a tuple, the first element replaced by the string
Re.group () and Re.groups (): For extracting child elements, a bracket in the pattern is a child element, and group and groups can only be used in the match and search methods,
Gets the child element by returning the match object. You can use \1,\2,\3 in the replacement string in the pattern string and sub and SUBN to refer to child elements
Compile (): is to store the rules, do not need to explain many times, improve the speed of operation
38 Basics of Python programming