Use of metacharacters
Re.findall (regex,string)
Function: In a string string, match the regex regular expression to match the item, and put it in a list to return
* Normal string
Metacharacters: ABC
Matching rules: matching string values
Match example: ABC
In [3]: Re.findall (' abc ', ' Abcdeabc ')
OUT[3]: [' abc ', ' ABC ']
* use "or" to make multiple matches
Metacharacters: Re1 | Re2
Matching rules: Can match the expression of the regular expression Re1, but also can match the content expressed by Re2
Matching Example: AB | BC-"AB BC
In [5]: Re.findall (' Ab|de ', ' abcdeabc ')
OUT[5]: [' ab ', ' de ', ' AB ']
* Point number "."
Metacharacters:.
Match rule: match any one character
Matching example: f.o―― "foo FAO [email protected]
In [6]: Re.findall (' f.o ', ' foo,[email protected] ')
OUT[6]: [' foo ', ' [email protected] '
* Match the beginning substring
Metacharacters: ^
Match rule: Matches the beginning of a string
Match example: ^from matches the starting part of a string with a from
In [9]: Re.findall (' ^from ', ' from China ')
OUT[9]: [' from ']
In [ten]: Re.findall (' ^from ', ' I come from China ')
OUT[10]: []
* Matches the end of a string
Metacharacters: $
Match rule: Use $ tag when a string ends with
Match example: py$-"matches all strings ending in py
in [+]: Re.findall (' py$ ', ' test.py ')
OUT[17]: [' py ']
In []: Re.findall (' py$ ', ' python ')
OUT[18]: []
* Match any of 0 or more characters
Metacharacters: *
Match rule: matches the preceding character or regular expression 0 or more times
Matching example: ab*-abbbbbbbb
In []: Re.findall ('. * ', ' askjdfh89w4234 ')
OUT[23]: [' askjdfh89w4234 ', ']
In []: Re.findall ('. * ', ' ASKJDFH89W4234SDFHHG ')
OUT[24]: [' askjdfh89w4234sdfhhg ', ']
in [+]: Re.findall (' ab* ', ' a ')
OUT[25]: [' a ']
in [+]: Re.findall (' ab* ', ' abbbb ')
OUT[26]: [' abbbb ']
* Match any of 1 or more characters
Metacharacters: +
Match rule: matches the preceding character or regular expression 1 or more times
Matching example: ab+-abbbbbbbb
in [+]: Re.findall (' ab+ ', ' abbbb ')
OUT[28]: [' abbbb ']
In []: Re.findall (' ab+ ', ' a ')
OUT[29]: []
* Match characters 0 or 1 times
Metacharacters:?
Match rule: matches the preceding character or regular expression 0 or 1 times
Matching example: AB? --"A or AB"
in [+]: re.findall (' ab ', ' a ')
OUT[31]: [' a ']
in [+]: re.findall (' ab ', ' ab ')
OUT[32]: [' AB ']
* Match previous character or re specified number of times
Metacharacters: {n} n represents a number
Match rule: matches the preceding character or regular expression n times
Match example: ab{3}--"abbb
In [the]: Re.findall (' ab{3} ', ' abbbbbb ')
OUT[34]: [' abbb ']
in [+]: Re.findall (' ab{3} ', ' ABB ')
OUT[35]: []
* Match previous character or re specified number of times
Metacharacters: {M,n} m,n represents a number
Match rule: matches the preceding character or regular expression m to n times
Match example: ab{3,8}--"ABBB abbbbbbbb
In [approx]: Re.findall (' ab{3,8} ', ' abbb ')
OUT[36]: [' abbb ']
In [PNS]: Re.findall (' ab{3,8} ', ' abbbbbbbbbbb ')
OUT[37]: [' abbbbbbbb ']
* Character Set Matching
Metacharacters: [ABCD]
Match rule: matches any one of the characters in brackets
Matching example: B[abcd]t, Bat BBT BCT BDT
In [MAX]: Re.findall (' b[abc123]t ', ' bat,b1tba3t ')
OUT[40]: [' bat ', ' b1t ']
in [+]: Re.findall (' [AB][CD] ', ' ACADBCBD ')
OUT[41]: [' AC ', ' ad ', ' BC ', ' BD ']
* Character Set Matching
Metacharacters: [A-za-z0-9] [A-z] [0-9] [a-za-z] [3-8]
[B-x]
Match rule: matches characters in any interval within brackets
Matching example: [a-za-z0-9]+ matches any one by alphanumeric group in []: Re.findall (' [a-za-z0-9]+ ', ' safd1324 ')
OUT[43]: [' safd1324 ']
In []: Re.findall (' [a-za-z0-9]+ ', ' adf$&^%123 ')
OUT[44]: [' ADF ', ' 123 ']
into a non-empty string
* The character set does not match
metacharacters: [^ ...] ... Indicates anything in the above two items
Match rule: matches any character set in a non-bracket
Match example: [^aeiou] matches any one of the non-AEIOU characters
[^a-z] matches any non-lowercase letter
in [+]: Re.findall (' [^a-z] ', ' abc1j2^&d ')
OUT[46]: [' 1 ', ' 2 ', ' ^ ', ' & ']
in [+]: Re.findall (' [^aeiou] ', ' Hello World ')
OUT[47]: [' H ', ' l ', ' l ', ' ', ' w ', ' R ', ' L ', ' d ']
* Match (not) numeric characters
metacharacters: \d [0-9] \d [^0-9]
Match rule: \d matches any numeric character
\d matches any non-numeric character
Matching example: \d{3}--' 123 '
in [+]: Re.findall (' \d{3} ', ' Hello 1234 ')
OUT[49]: [' 123 ']
in [[]: Re.findall (' \d{3} ', ' Hello 1234 ')
OUT[50]: [' hel ', ' lo ']
* Matches (not) alphanumeric characters
metacharacters: \w [a-za-z0-9] \w [^a-za-z0-9]
Match rule: \w matches any one letter or number character
\w matches any non-alphabetic or numeric character
Matching example: \w{3}--' A23 '
In [Wuyi]: Re.findall (' [a-z]\w* ', ' Hello World ')
OUT[51]: [' Hello ', ' World ']
In [Re.findall]: (' \w+-\d+ ', ' xiaoming-56 ')
OUT[52]: [' xiaoming-56 ']
* MATCH (non) NULL characters
Metacharacters: \s (space \ t \ r) \s
Match rule: \s matches any one null character
\s matches any non-null character
Matching example: Hello World, Hello World
in [+]: Re.findall (' Hello\s+world ', ' Hello World ')
OUT[58]: [' Hello World ']
in [+]: Re.findall (' \s* ', ' helloworld&* ask ')
OUT[60]: [' helloworld&* ', ' ', ' ask ', ']
In [a]: Re.findall (' \s ', ' a B c\n ')
OUT[61]: [' ', ' ', ' \ n ']
* Match string start and end
Metacharacters \a (^) \z ($)
Match rule: \a matches the beginning of a string
\z the end position of the matching string
Matching example: \aabc\z ^abc$-> ABC
In []: Re.findall (' \aabc\z ', ' abcabc ')
OUT[70]: []
in [+]: Re.findall (' \aabc\z ', ' abc ')
OUT[66]: [' abc ']
In []: Re.findall (' efg\z ', ' HI,ABCDEFG ')
OUT[68]: [' EFG ']
* Match (non) word boundary
metacharacters: \b \b
Matching rules: Non-alphabetic parts are not considered part of the word
To think of the part of a continuous letter as a word
Matching example: "This is a%test%"
In [OK]: Re.findall (R ' \btest\b ', ' This is a%test% ')
OUT[74]: [' Test ']
In ["]: Re.findall (R ' \bthis\b ', ' This is a%test% ')
OUT[75]: [' this ']
In [the]: Re.findall (R ' \bis\b ', ' This is a%test% ')
OUT[76]: [' is ']
In [All]: Re.findall (R ' \bis\b ', ' This is a%test% ')
OUT[77]: [' is ']
in [+]: Re.findall (R ' is\b ', ' This is a%test% ')
OUT[78]: [' is ', ' is ']
Metacharacters Summary
Characters: Match actual characters
Match a single character:. [] \d \d \w \w \s \s
Match repetitions: * +? {}
Match beginning end: ^ $ \a \z \b \b
Other: | [^ ]
Raw string and escape
R "Hello World", raw string
Raw string Features: No Escape parsing
"Hello \ n World" \ n means line break
R "Hello \ n World" \ n = two characters
When to add R
The raw string is converted to prevent Python from escaping parsing of the string, so it's best to add r when the regular expression itself has "\"
Escape matching of regular expressions
When matching a special character within a regular expression, the regular expression itself also needs to be escaped, and the regular expression should be "\*" if it is to match the * in the string.
Special characters are as follows:
\ * . ? () [] {} "" ''
Match the * in the string
in [+]: Re.findall (R ' \* ', ' * is not \ \ \ \ \ is not? ')
OUT[86]: [' * ']
In [All]: Re.findall (' \\* ', ' * is not \ \ \ \ \ is not? ')
OUT[87]: [' * ']
Match "\" in string
In [the]: Re.findall (' \\\\ ', ' * is not \ \ \ \ \ is not? ')
OUT[89]: [' \ \ ', ' \ \ ']
In [All]: Re.findall (r ' \ \ ', ' * is not \ \ \ \ \ is not? ')
OUT[90]: [' \ \ ', ' \ \ ']
Greed and non-greed
Greedy mode: In the case of no processing, the regular expression is greedy mode by default. Which is the use of * +? {m,n}, match as many backwards as possible.
e.g.
Ab* can match a ab abbb ... Well, when B is plenty, it's going to match as much as possible.
In []: Re.findall (R ' ab* ', ' abbbbbbb ')
OUT[96]: [' abbbbbbb ']
Non-greedy mode: match the content of the compound regular condition as little as possible
Greedy mode---"non-greedy mode method: Back Add"? ”
That is *? +??? {m,n}?
in [+]: Re.findall (R ' ab*? ', ' abbbbbbb ')
OUT[100]: [' a ']
In [101]: Re.findall (R ' ab+? ', ' abbbbbbb ')
OUT[101]: [' AB ']
In [102]: Re.findall (R ' ab?? ', ' abbbbbbb ')
OUT[102]: [' a ']
In [103]: Re.findall (R ' ab{2,4}? ', ' abbbbbbb ')
OUT[103]: [' ABB ']
Regular expression grouping
((AB) * (CD))
Regular expression (AB) *CD
1. Regular expressions can be grouped, the grouped flags are parentheses (), each parenthesis is a subgroup of the regular expression, and each subgroup is part of the overall regular expression and is also a small regular expression
2. When there are multiple subgroups, we call the first and second from the outer layer to the inside, respectively. Child groups. When at the same level, count from left to right separately
3. Group will the table * +? {} Repeat behavior, that is, each grouping as a whole, to do the corresponding repeat operation
4. When a subgroup can match multiple target string contents, only one content is returned
In [113]: Re.findall (R ' (AB) +cd ', ' ababcdef ')
OUT[113]: [' AB ']
5. Each group can be named, and we can identify each group according to its name.
Format: (? P<word>hello)
Give a name to the subgroup (hello), the name is "word"
Child groups are called by name (? P=word) represents the replication of a subgroup of regular expression content
In [123]: Re.findall (R ' (? P<word>hello) \s+ (? P=word)) ', ' Hello Hello ')
OUT[123]: [(' Hello hello ', ' hello ')]
Python regular some simple matches