1. Regular expressions
?? Regular expressions provide the basis for advanced text pattern matching, extraction, and/or textual search and replace functions, and simply, regular expressions are strings of characters and special symbols. Python supports regular expressions through the RE module in the standard library.
2. Special symbols and characters
?? First introduce the most common special symbols and characters, the so-called meta-characters.
| notation |
Description |
Regular expression usage examples |
| Literal |
The literal value of the matching text string literal |
Re.findall ("Chen", "Sadchen21") |
| . |
Match any character (except \ n) |
Re.findall ("C.. N "," Sadchen21 ") |
| ^ |
Start a match from the beginning of the string |
Re.findall ("^sad", "Sadchen21") |
| $ |
Match the terminating part of a string |
Re.findall ("21$", "Sadddchen21") |
| * |
Matches 0 or more occurrences of the preceding regular expression |
Re.findall ("sad*", "Saddddchen21") |
| + |
Matches 1 or more occurrences of the preceding regular expression |
Re.findall ("sad+", "Sadddchen21") |
| ? |
Matches a regular expression that appears before 0 or 1 times |
Re.findall ("Sad", "sadddchen21") |
| N |
Matches regular expressions that appear before n times |
Re.findall ("Sad{3}", "Sadddchen21") |
| {M,n} |
Matches regular expressions that appear before m~n |
Re.findall ("sad{1,3}", "Sadddcen21") |
| [] |
Match any single character from a character set |
Re.findall ("[A-Z]", "sadddcen21") |
| [^..] |
does not match any of the characters that appear in this character set, including a range of characters |
Re.findall ("[^a-z]", "sadddcen21") |
| (... ) |
Match a closed regular expression, and then save as a child group |
Re.findall ("[(A-Z)]", "sadddcen21") |
| \d |
Matches any decimal number, consistent with [0-9] (/d versus/D, does not match any non-numeric number) |
Re.findall ("\d", "Sa2dcen21") |
| \w |
Matches any alphanumeric character, same as [a-z],[a-z],[0-9], (\w opposite) |
Re.findall ("\w", "Sa2dcen21") |
| \s |
Match any whitespace character, same as \n\r\t, (opposite to \s) |
Re.findall ("\s", "Sa2dce\nn 21") |
| \b |
Matches any word boundary, matching special characters (\b opposite) |
Re.findall (R "i\b", "I am Czp") |
?? In the above/b example, why add R?
?? The function of R is to let the following string do not escape any, if not add R can play the same role?
re.findall("I\\b","I am Czp") 通过转义符\来实现
3. Important symbols detailed usage explanation
- The pipe symbol "|", which represents the selected one, represents a "select one from multiple modes" operation, which is used to split the different regular expressions, and the choice of a match is sometimes called (union) or logical OR
re.findall("ka|b","sdkakb11") 匹配ka或bre.findall("ka|b","sdka11")
- matches any character ".", a dot or a period (.) symbol matches any character except the line break \ n
re.findall("s..a","sdka11") 匹配在s和a之间任意的两个字符re.findall("..","sdka11") 匹配任意两个字符re.findall(".a","sdka112") 匹配a前的任意一个字符
?? To explicitly match a period symbol itself, you must use a backslash to escape the function of the period symbol, such as "\."
- Match "^" "$" "\b" from the beginning or end of the string or the word boundary
re.findall("^sa","sdka112") 匹配以s开头的字符或字符串re.findall("12$","sdka112") 匹配以12结尾的字符或字符串
?? match boundary character "\b"
?? \b is used to match the bounds of a word, which means that if a pattern must be in the beginning of a word, regardless of whether the word precedes it (the word is in the middle of the string), and \b matches the pattern that appears in the middle of a word, which is not the word boundary
re.findall("er","never") 匹配任意包含the的字符串re.findall(r"er\b","never") 匹配任意以er为起始位置的字符串re.findall(r"er\B","evern") 匹配包含er但不以er为起始的字符串
?? Create character Set "[]"
?? "." Can be used to match any symbol, but at some point it needs to match certain characters, which requires [] to match any character contained in a square bracket. such as b[ner]t matching string is bnt,bet,brt, square brackets represent the logical or function, for a single character, if you want to: either match a, or match B, you can use [AB], the character set only applies to the case of single-character, if you want to match more than one character, you need to use the alternative method "|"
?? The most common special symbol "" "+" "? "
??" ","? "," + "can be used to match one, multiple, or no occurrences of the string pattern," * "to match its left side of the regular expression appears 0 or more times," + "will match one or more occurrences of the case,"? "will match 0 or one occurrences of the regular expression, while {N} or {m,n} will eventually exactly match the number of times the preceding regular expression N or a certain range."
?? When pattern matching uses the grouping operator, the regular expression engine will attempt to match as many characters as possible, which is called greedy match
re.findall("sad*","saddddsds") 匹配d,d可以出现0次,或多次re.findall("sad+","saddddsds") 匹配d,d不可以出现0次,匹配1次或多次re.findall("sad?","saddddsds") 匹配d,d可以出现0次,匹配0次或1次re.findall("\d{2}","sad123dsds") 匹配两个连续的数字字符
?? If you want to extract any specific string or substring that has been successfully matched, you need to wrap the regular expression with "()"
?? When using regular expressions, a pair of parentheses can achieve any of the following functions
- Grouping regular Expressions
- Matching subgroups
?? Example: When there are two different regular expressions and you want to use them to compare the same string, you need to use "()", and you can group regular expressions by repeating the operators so that the matched substrings are saved for later use.
?? extended notation, starting with a question mark (?..), one of the important (? P
Core functions and methods of 4.re modules
?? Common Regular Expression Properties
Compile () | Use the optional markup to compile the pattern of the regular expression, returning an object
Match () | Matches the string, if successful, returns the matching object, or none if it fails
Search () | Matches the first occurrence of the regular expression, succeeds, returns the matching object, fails, returns none
FindAll () | Finds all occurrences of the regular expression pattern in a string and returns a matching list
Finditer () | Same as FindAll, but returns not a list, but an iterator
Spilt () | Splits a string into a list, returning a list of successful matches
Sub () | Replace the position of all regular expressions in the string where the pattern appears
Purge () | Regular expression pattern with explicit implicit compilation
Group () | Returns the entire matching object, or a specific subgroup numbered NUM
?? The function is written here, the next will write the object-oriented part of the content, if there are missing, welcome to communicate with me, I will be the first time to fill the missing points of knowledge.
Python function article (7)-Regular expression