Python regular some simple matches

Last Update:2018-04-02 Source: Internet

Author: User

Tags alphanumeric characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Use of metacharacters

Re.findall (regex,string)

Function: In a string string, match the regex regular expression to match the item, and put it in a list to return

* Normal string

Metacharacters: ABC

Matching rules: matching string values

Match example: ABC

In [3]: Re.findall (' abc ', ' Abcdeabc ')

OUT[3]: [' abc ', ' ABC ']

* use "or" to make multiple matches

Metacharacters: Re1 | Re2

Matching rules: Can match the expression of the regular expression Re1, but also can match the content expressed by Re2

Matching Example: AB | BC-"AB BC

In [5]: Re.findall (' Ab|de ', ' abcdeabc ')

OUT[5]: [' ab ', ' de ', ' AB ']

* Point number "."

Metacharacters:.

Match rule: match any one character

Matching example: f.o―― "foo FAO [email protected]

In [6]: Re.findall (' f.o ', ' foo,[email protected] ')

OUT[6]: [' foo ', ' [email protected] '

* Match the beginning substring

Metacharacters: ^

Match rule: Matches the beginning of a string

Match example: ^from matches the starting part of a string with a from

In [9]: Re.findall (' ^from ', ' from China ')

OUT[9]: [' from ']

In [ten]: Re.findall (' ^from ', ' I come from China ')

OUT[10]: []

* Matches the end of a string

Metacharacters: $

Match rule: Use $ tag when a string ends with

Match example: py$-"matches all strings ending in py

in [+]: Re.findall (' py$ ', ' test.py ')

OUT[17]: [' py ']

In []: Re.findall (' py$ ', ' python ')

OUT[18]: []

* Match any of 0 or more characters

Metacharacters: *

Match rule: matches the preceding character or regular expression 0 or more times

Matching example: ab*-abbbbbbbb

In []: Re.findall ('. * ', ' askjdfh89w4234 ')

OUT[23]: [' askjdfh89w4234 ', ']

In []: Re.findall ('. * ', ' ASKJDFH89W4234SDFHHG ')

OUT[24]: [' askjdfh89w4234sdfhhg ', ']

in [+]: Re.findall (' ab* ', ' a ')

OUT[25]: [' a ']

in [+]: Re.findall (' ab* ', ' abbbb ')

OUT[26]: [' abbbb ']

* Match any of 1 or more characters

Metacharacters: +

Match rule: matches the preceding character or regular expression 1 or more times

Matching example: ab+-abbbbbbbb

in [+]: Re.findall (' ab+ ', ' abbbb ')

OUT[28]: [' abbbb ']

In []: Re.findall (' ab+ ', ' a ')

OUT[29]: []

* Match characters 0 or 1 times

Metacharacters:?

Match rule: matches the preceding character or regular expression 0 or 1 times

Matching example: AB? --"A or AB"

in [+]: re.findall (' ab ', ' a ')

OUT[31]: [' a ']

in [+]: re.findall (' ab ', ' ab ')

OUT[32]: [' AB ']

* Match previous character or re specified number of times

Metacharacters: {n} n represents a number

Match rule: matches the preceding character or regular expression n times

Match example: ab{3}--"abbb

In [the]: Re.findall (' ab{3} ', ' abbbbbb ')

OUT[34]: [' abbb ']

in [+]: Re.findall (' ab{3} ', ' ABB ')

OUT[35]: []

* Match previous character or re specified number of times

Metacharacters: {M,n} m,n represents a number

Match rule: matches the preceding character or regular expression m to n times

Match example: ab{3,8}--"ABBB abbbbbbbb

In [approx]: Re.findall (' ab{3,8} ', ' abbb ')

OUT[36]: [' abbb ']

In [PNS]: Re.findall (' ab{3,8} ', ' abbbbbbbbbbb ')

OUT[37]: [' abbbbbbbb ']

* Character Set Matching

Metacharacters: [ABCD]

Match rule: matches any one of the characters in brackets

Matching example: B[abcd]t, Bat BBT BCT BDT

In [MAX]: Re.findall (' b[abc123]t ', ' bat,b1tba3t ')

OUT[40]: [' bat ', ' b1t ']

in [+]: Re.findall (' [AB][CD] ', ' ACADBCBD ')

OUT[41]: [' AC ', ' ad ', ' BC ', ' BD ']

* Character Set Matching

Metacharacters: [A-za-z0-9] [A-z] [0-9] [a-za-z] [3-8]

[B-x]

Match rule: matches characters in any interval within brackets

Matching example: [a-za-z0-9]+ matches any one by alphanumeric group in []: Re.findall (' [a-za-z0-9]+ ', ' safd1324 ')

OUT[43]: [' safd1324 ']

In []: Re.findall (' [a-za-z0-9]+ ', ' adf$&^%123 ')

OUT[44]: [' ADF ', ' 123 ']

into a non-empty string

* The character set does not match

metacharacters: [^ ...] ... Indicates anything in the above two items

Match rule: matches any character set in a non-bracket

Match example: [^aeiou] matches any one of the non-AEIOU characters

[^a-z] matches any non-lowercase letter

in [+]: Re.findall (' [^a-z] ', ' abc1j2^&d ')

OUT[46]: [' 1 ', ' 2 ', ' ^ ', ' & ']

in [+]: Re.findall (' [^aeiou] ', ' Hello World ')

OUT[47]: [' H ', ' l ', ' l ', ' ', ' w ', ' R ', ' L ', ' d ']

* Match (not) numeric characters

metacharacters: \d [0-9] \d [^0-9]

Match rule: \d matches any numeric character

\d matches any non-numeric character

Matching example: \d{3}--' 123 '

in [+]: Re.findall (' \d{3} ', ' Hello 1234 ')

OUT[49]: [' 123 ']

in [[]: Re.findall (' \d{3} ', ' Hello 1234 ')

OUT[50]: [' hel ', ' lo ']

* Matches (not) alphanumeric characters

metacharacters: \w [a-za-z0-9] \w [^a-za-z0-9]

Match rule: \w matches any one letter or number character

\w matches any non-alphabetic or numeric character

Matching example: \w{3}--' A23 '

In [Wuyi]: Re.findall (' [a-z]\w* ', ' Hello World ')

OUT[51]: [' Hello ', ' World ']

In [Re.findall]: (' \w+-\d+ ', ' xiaoming-56 ')

OUT[52]: [' xiaoming-56 ']

* MATCH (non) NULL characters

Metacharacters: \s (space \ t \ r) \s

Match rule: \s matches any one null character

\s matches any non-null character

Matching example: Hello World, Hello World

in [+]: Re.findall (' Hello\s+world ', ' Hello World ')

OUT[58]: [' Hello World ']

in [+]: Re.findall (' \s* ', ' helloworld&* ask ')

OUT[60]: [' helloworld&* ', ' ', ' ask ', ']

In [a]: Re.findall (' \s ', ' a B c\n ')

OUT[61]: [' ', ' ', ' \ n ']

* Match string start and end

Metacharacters \a (^) \z ($)

Match rule: \a matches the beginning of a string

\z the end position of the matching string

Matching example: \aabc\z ^abc$-> ABC

In []: Re.findall (' \aabc\z ', ' abcabc ')

OUT[70]: []

in [+]: Re.findall (' \aabc\z ', ' abc ')

OUT[66]: [' abc ']

In []: Re.findall (' efg\z ', ' HI,ABCDEFG ')

OUT[68]: [' EFG ']

* Match (non) word boundary

metacharacters: \b \b

Matching rules: Non-alphabetic parts are not considered part of the word

To think of the part of a continuous letter as a word

Matching example: "This is a%test%"

In [OK]: Re.findall (R ' \btest\b ', ' This is a%test% ')

OUT[74]: [' Test ']

In ["]: Re.findall (R ' \bthis\b ', ' This is a%test% ')

OUT[75]: [' this ']

In [the]: Re.findall (R ' \bis\b ', ' This is a%test% ')

OUT[76]: [' is ']

In [All]: Re.findall (R ' \bis\b ', ' This is a%test% ')

OUT[77]: [' is ']

in [+]: Re.findall (R ' is\b ', ' This is a%test% ')

OUT[78]: [' is ', ' is ']

Metacharacters Summary

Characters: Match actual characters

Match a single character:. [] \d \d \w \w \s \s

Match repetitions: * +? {}

Match beginning end: ^ $ \a \z \b \b

Other: | [^ ]

Raw string and escape

R "Hello World", raw string

Raw string Features: No Escape parsing

"Hello \ n World" \ n means line break

R "Hello \ n World" \ n = two characters

When to add R

The raw string is converted to prevent Python from escaping parsing of the string, so it's best to add r when the regular expression itself has "\"

Escape matching of regular expressions

When matching a special character within a regular expression, the regular expression itself also needs to be escaped, and the regular expression should be "\*" if it is to match the * in the string.

Special characters are as follows:

\ * . ？ () [] {} "" ''

Match the * in the string

in [+]: Re.findall (R ' \* ', ' * is not \ \ \ \ \ is not? ')

OUT[86]: [' * ']

In [All]: Re.findall (' \\* ', ' * is not \ \ \ \ \ is not? ')

OUT[87]: [' * ']

Match "\" in string

In [the]: Re.findall (' \\\\ ', ' * is not \ \ \ \ \ is not? ')

OUT[89]: [' \ \ ', ' \ \ ']

In [All]: Re.findall (r ' \ \ ', ' * is not \ \ \ \ \ is not? ')

OUT[90]: [' \ \ ', ' \ \ ']

Greed and non-greed

Greedy mode: In the case of no processing, the regular expression is greedy mode by default. Which is the use of * +? {m,n}, match as many backwards as possible.

e.g.

Ab* can match a ab abbb ... Well, when B is plenty, it's going to match as much as possible.

In []: Re.findall (R ' ab* ', ' abbbbbbb ')

OUT[96]: [' abbbbbbb ']

Non-greedy mode: match the content of the compound regular condition as little as possible

Greedy mode---"non-greedy mode method: Back Add"? ”

That is *? +??? {m,n}?

in [+]: Re.findall (R ' ab*? ', ' abbbbbbb ')

OUT[100]: [' a ']

In [101]: Re.findall (R ' ab+? ', ' abbbbbbb ')

OUT[101]: [' AB ']

In [102]: Re.findall (R ' ab?? ', ' abbbbbbb ')

OUT[102]: [' a ']

In [103]: Re.findall (R ' ab{2,4}? ', ' abbbbbbb ')

OUT[103]: [' ABB ']

Regular expression grouping

((AB) * (CD))

Regular expression (AB) *CD

1. Regular expressions can be grouped, the grouped flags are parentheses (), each parenthesis is a subgroup of the regular expression, and each subgroup is part of the overall regular expression and is also a small regular expression

2. When there are multiple subgroups, we call the first and second from the outer layer to the inside, respectively. Child groups. When at the same level, count from left to right separately

3. Group will the table * +? {} Repeat behavior, that is, each grouping as a whole, to do the corresponding repeat operation

4. When a subgroup can match multiple target string contents, only one content is returned

In [113]: Re.findall (R ' (AB) +cd ', ' ababcdef ')

OUT[113]: [' AB ']

5. Each group can be named, and we can identify each group according to its name.

Format: (? P<word>hello)

Give a name to the subgroup (hello), the name is "word"

Child groups are called by name (? P=word) represents the replication of a subgroup of regular expression content

In [123]: Re.findall (R ' (? P<word>hello) \s+ (? P=word)) ', ' Hello Hello ')

OUT[123]: [(' Hello hello ', ' hello ')]

Python regular some simple matches

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More