Tag: Returns the object www. Flags Modeling Tail Center NET Abort
Regular expression: The pattern that matches the text fragment.
- Wildcard character: matches more than one string. such as '. ' You can match all characters except the newline character, only one.
- Escape of special characters: if you want to match the string ' python.org ', if you use ' python.org ' directly to match, it will not only match to ' python.org ', but also ' pythoniorg ' and other strings, at this time need to '. ' To be escaped, use ' python\\.org ' or R ' python\.org ' to match.
- Character set: Use brackets to create character sets, including all the characters you want to match, such as ' [Pj]ython ' can match to ' python ' and ' Jython ', ' [a-za-z0-9] ' can match any uppercase and lowercase letters and numbers, and the character set can only match one character. The inverse character set, such as ' [^ABC] ', means that all characters except A, B, and C can be matched.
- Selectors and sub-patterns: pipe symbol ' | ', used for selection, such as matching only ' Python ' and ' Jython ', using ' python|jython ' for matching, or just using selection operators for part of the pattern, such as ' (p|j) Ython ', using parentheses to enclose the desired part, Called Sub-mode,
- Optional and repeating sub-modes: Adding a question mark after the sub-mode becomes optional, such as R ' (http://)? (www\.)? Python\.org ' can be matched to ' http://www. ', ' http: ', ' www. ' and ' python.org ' strings. A question mark indicates that the child mode is allowed to appear 0 or one time.
(pattern) *: Allow sub-mode to appear 0 or more times
(pattern) +: Allow sub-mode to appear 1 or more times
(pattern) {m,n}: Allow mode to repeat M to n times
- The beginning and end of the string: the previous match is for the entire string, if you want to match the beginning or end of the string, you need to use the ' ^ ' tag, such as ' ^ht+p ' only the ' ht+p ' character that matches the string, and the ' $ ' identifier used to match the end of the qualifying string .
Common functions of the RE module
Function |
Describe |
Compile (Pattern[,flags]) |
Create a Pattern object based on a string containing a regular expression |
Search (Pattern,string[,flags]) |
Finding patterns in strings |
Match (Pattern,string[,flags]) |
Match pattern at start of string |
Split (pattern,string[,maxsplit = 0]) |
To split a string based on a pattern match |
FindAll (pattern,string) |
List all occurrences of a pattern in a string |
Sub (pat,repl,string[,count = 0]) |
Replaces all Pat matches in a string with REPL |
Escape (String) |
Escapes all special regular expression characters in a string |
For the matching function in the RE module, the match succeeds by returning the Matchobject object, which includes the substring information of the matching pattern, and which pattern matches which part of the information, these "parts" are called groups, and the group is the sub-pattern placed in the tuple brackets.
The mode ' There (is a (wee) (Cooper)) who (lived in Fyfe) ' contains the following groups:
0 There is a wee Cooper who lived in Fyfe
1 was a wee Cooper
2 Wee
3 Cooper
4 lived in Fyfe
An important way to match objects with re:
Method |
Describe |
Group ([group1,......]) |
Get a match for a given sub-pattern (group) |
Start ([group]) |
Returns the starting position of a match for a given group |
End ([group]) |
Returns the end position of a match for a given group |
span ([group]) |
Returns the start and end positions of a group |
>>> import re >>> m = Re.match ( " www\. *)\.. {3} ", " www.python.org " ) >>> G1 = M.group (1) >>> M.group (1 " python " >>> m.end (1 10>>> M.span (1 4, 10 >>> M.group (0) " www.python.org "
Add a '? ' after the repeat operator It turns the repetition operation into a non-greedy version.
re.split() function when cutting, if the pattern contains parentheses, the contents of the parentheses will exist between each substring.
Re.split (pattern,string[,maxsplit = 0])
The split function also has a parameter that limits the number of splits Maxsplit
The Re.findall function Returns all occurrences of a given pattern in the form of a list.
Re.findall (PATTERN,STR)
The re.sub () function replaces the leftmost and non-overlapping substrings with the specified content.
Re.sub (pat,repi,str[,count = 0])
The sub () function can be replaced by a group, and any escape sequence that occurs using ' \\n ' form in the replacement will be replaced with a string that matches the group N in the pattern.
For example, replace *something* in text with <em>something</em>
>>> Pat = r'\*([^\*]+)\*'>>> Re.sub (pat,r'<em>\1</em>','Hello *world*!')'Hello <em>world</em>!'>>> Pat =re.compile (R'\*([^\*]+)\*')>>> Re.sub (pat,r'<em>\1</em>','Hello *world*!')'Hello <em>world</em>!'
The repeating operator is greedy, and it makes as many matches as possible.
>>> Pat = r'\*(.+)\*'>>> Re.sub (pat,r'<em>\1</em>','Hello *world*!')'Hello <em>world</em>!'>>> Re.sub (pat,r'<em>\1</em>','*hello* *world*!')'<em>hello* *world</em>!'
In this case, you need to use non-greedy mode, that is, add a '? ' after the repeat match.
>>> Pat = r'\*(.+?) \*'>>> Re.sub (pat,r'<em>\1</em>','Hello *world*!')'Hello <em>world</em>!'>>> Re.sub (pat,r'<em>\1</em>','*hello* *world*!')'<em>hello</em> <em>world</em>!'
Click to view re.sub () function
The Re.escape function is a function that escapes all characters that may be interpreted as regular operators.
>>> re.escape ('hello.python')'hello\\.python '
Regular from summer to see now, intermittent, don't give up AH
Regular expressions of Python learning notes