Python regular some simple matches

Source: Internet
Author: User
Tags alphanumeric characters


Use of metacharacters


Re.findall (regex,string)

Function: In a string string, match the regex regular expression to match the item, and put it in a list to return


* Normal string


Metacharacters: ABC

Matching rules: matching string values

Match example: ABC


In [3]: Re.findall (' abc ', ' Abcdeabc ')

OUT[3]: [' abc ', ' ABC ']



* use "or" to make multiple matches


Metacharacters: Re1 | Re2

Matching rules: Can match the expression of the regular expression Re1, but also can match the content expressed by Re2

Matching Example: AB | BC-"AB BC

In [5]: Re.findall (' Ab|de ', ' abcdeabc ')

OUT[5]: [' ab ', ' de ', ' AB ']


* Point number "."


Metacharacters:.

Match rule: match any one character

Matching example: f.o―― "foo FAO [email protected]

In [6]: Re.findall (' f.o ', ' foo,[email protected] ')

OUT[6]: [' foo ', ' [email protected] '


* Match the beginning substring


Metacharacters: ^

Match rule: Matches the beginning of a string

Match example: ^from matches the starting part of a string with a from

In [9]: Re.findall (' ^from ', ' from China ')

OUT[9]: [' from ']


In [ten]: Re.findall (' ^from ', ' I come from China ')

OUT[10]: []


* Matches the end of a string


Metacharacters: $

Match rule: Use $ tag when a string ends with

Match example: py$-"matches all strings ending in py

in [+]: Re.findall (' py$ ', ' test.py ')

OUT[17]: [' py ']


In []: Re.findall (' py$ ', ' python ')

OUT[18]: []


* Match any of 0 or more characters


Metacharacters: *

Match rule: matches the preceding character or regular expression 0 or more times

Matching example: ab*-abbbbbbbb

In []: Re.findall ('. * ', ' askjdfh89w4234 ')

OUT[23]: [' askjdfh89w4234 ', ']


In []: Re.findall ('. * ', ' ASKJDFH89W4234SDFHHG ')

OUT[24]: [' askjdfh89w4234sdfhhg ', ']


in [+]: Re.findall (' ab* ', ' a ')

OUT[25]: [' a ']


in [+]: Re.findall (' ab* ', ' abbbb ')

OUT[26]: [' abbbb ']


* Match any of 1 or more characters


Metacharacters: +

Match rule: matches the preceding character or regular expression 1 or more times

Matching example: ab+-abbbbbbbb

in [+]: Re.findall (' ab+ ', ' abbbb ')

OUT[28]: [' abbbb ']


In []: Re.findall (' ab+ ', ' a ')

OUT[29]: []


* Match characters 0 or 1 times


Metacharacters:?

Match rule: matches the preceding character or regular expression 0 or 1 times

Matching example: AB? --"A or AB"

in [+]: re.findall (' ab ', ' a ')

OUT[31]: [' a ']


in [+]: re.findall (' ab ', ' ab ')

OUT[32]: [' AB ']


* Match previous character or re specified number of times


Metacharacters: {n} n represents a number

Match rule: matches the preceding character or regular expression n times

Match example: ab{3}--"abbb

In [the]: Re.findall (' ab{3} ', ' abbbbbb ')

OUT[34]: [' abbb ']


in [+]: Re.findall (' ab{3} ', ' ABB ')

OUT[35]: []


* Match previous character or re specified number of times


Metacharacters: {M,n} m,n represents a number

Match rule: matches the preceding character or regular expression m to n times

Match example: ab{3,8}--"ABBB abbbbbbbb

In [approx]: Re.findall (' ab{3,8} ', ' abbb ')

OUT[36]: [' abbb ']


In [PNS]: Re.findall (' ab{3,8} ', ' abbbbbbbbbbb ')

OUT[37]: [' abbbbbbbb ']


* Character Set Matching


Metacharacters: [ABCD]

Match rule: matches any one of the characters in brackets

Matching example: B[abcd]t, Bat BBT BCT BDT

In [MAX]: Re.findall (' b[abc123]t ', ' bat,b1tba3t ')

OUT[40]: [' bat ', ' b1t ']


in [+]: Re.findall (' [AB][CD] ', ' ACADBCBD ')

OUT[41]: [' AC ', ' ad ', ' BC ', ' BD ']


* Character Set Matching


Metacharacters: [A-za-z0-9] [A-z] [0-9] [a-za-z] [3-8]

[B-x]

Match rule: matches characters in any interval within brackets

Matching example: [a-za-z0-9]+ matches any one by alphanumeric group in []: Re.findall (' [a-za-z0-9]+ ', ' safd1324 ')

OUT[43]: [' safd1324 ']


In []: Re.findall (' [a-za-z0-9]+ ', ' adf$&^%123 ')

OUT[44]: [' ADF ', ' 123 ']

into a non-empty string


* The character set does not match


metacharacters: [^ ...] ... Indicates anything in the above two items

Match rule: matches any character set in a non-bracket

Match example: [^aeiou] matches any one of the non-AEIOU characters

[^a-z] matches any non-lowercase letter

in [+]: Re.findall (' [^a-z] ', ' abc1j2^&d ')

OUT[46]: [' 1 ', ' 2 ', ' ^ ', ' & ']


in [+]: Re.findall (' [^aeiou] ', ' Hello World ')

OUT[47]: [' H ', ' l ', ' l ', ' ', ' w ', ' R ', ' L ', ' d ']



* Match (not) numeric characters


metacharacters: \d [0-9] \d [^0-9]

Match rule: \d matches any numeric character

\d matches any non-numeric character

Matching example: \d{3}--' 123 '

in [+]: Re.findall (' \d{3} ', ' Hello 1234 ')

OUT[49]: [' 123 ']


in [[]: Re.findall (' \d{3} ', ' Hello 1234 ')

OUT[50]: [' hel ', ' lo ']


* Matches (not) alphanumeric characters


metacharacters: \w [a-za-z0-9] \w [^a-za-z0-9]

Match rule: \w matches any one letter or number character

\w matches any non-alphabetic or numeric character

Matching example: \w{3}--' A23 '

In [Wuyi]: Re.findall (' [a-z]\w* ', ' Hello World ')

OUT[51]: [' Hello ', ' World ']


In [Re.findall]: (' \w+-\d+ ', ' xiaoming-56 ')

OUT[52]: [' xiaoming-56 ']


* MATCH (non) NULL characters


Metacharacters: \s (space \ t \ r) \s

Match rule: \s matches any one null character

\s matches any non-null character

Matching example: Hello World, Hello World


in [+]: Re.findall (' Hello\s+world ', ' Hello World ')

OUT[58]: [' Hello World ']


in [+]: Re.findall (' \s* ', ' helloworld&* ask ')

OUT[60]: [' helloworld&* ', ' ', ' ask ', ']


In [a]: Re.findall (' \s ', ' a B c\n ')

OUT[61]: [' ', ' ', ' \ n ']


* Match string start and end


Metacharacters \a (^) \z ($)

Match rule: \a matches the beginning of a string

\z the end position of the matching string


Matching example: \aabc\z ^abc$-> ABC


In []: Re.findall (' \aabc\z ', ' abcabc ')

OUT[70]: []

in [+]: Re.findall (' \aabc\z ', ' abc ')

OUT[66]: [' abc ']

In []: Re.findall (' efg\z ', ' HI,ABCDEFG ')

OUT[68]: [' EFG ']


* Match (non) word boundary


metacharacters: \b \b

Matching rules: Non-alphabetic parts are not considered part of the word

To think of the part of a continuous letter as a word

Matching example: "This is a%test%"


In [OK]: Re.findall (R ' \btest\b ', ' This is a%test% ')

OUT[74]: [' Test ']


In ["]: Re.findall (R ' \bthis\b ', ' This is a%test% ')

OUT[75]: [' this ']


In [the]: Re.findall (R ' \bis\b ', ' This is a%test% ')

OUT[76]: [' is ']


In [All]: Re.findall (R ' \bis\b ', ' This is a%test% ')

OUT[77]: [' is ']


in [+]: Re.findall (R ' is\b ', ' This is a%test% ')

OUT[78]: [' is ', ' is ']

Metacharacters Summary


Characters: Match actual characters

Match a single character:. [] \d \d \w \w \s \s

Match repetitions: * +? {}

Match beginning end: ^ $ \a \z \b \b

Other: | [^ ]


Raw string and escape


R "Hello World", raw string


Raw string Features: No Escape parsing


"Hello \ n World" \ n means line break

R "Hello \ n World" \ n = two characters


When to add R


The raw string is converted to prevent Python from escaping parsing of the string, so it's best to add r when the regular expression itself has "\"


Escape matching of regular expressions

When matching a special character within a regular expression, the regular expression itself also needs to be escaped, and the regular expression should be "\*" if it is to match the * in the string.

Special characters are as follows:

\   *  .  ? ()  []  {}  ""  ''


Match the * in the string

in [+]: Re.findall (R ' \* ', ' * is not \ \ \ \ \ is not? ')

OUT[86]: [' * ']


In [All]: Re.findall (' \\* ', ' * is not \ \ \ \ \ is not? ')

OUT[87]: [' * ']


Match "\" in string

In [the]: Re.findall (' \\\\ ', ' * is not \ \ \ \ \ is not? ')

OUT[89]: [' \ \ ', ' \ \ ']


In [All]: Re.findall (r ' \ \ ', ' * is not \ \ \ \ \ is not? ')

OUT[90]: [' \ \ ', ' \ \ ']



Greed and non-greed


Greedy mode: In the case of no processing, the regular expression is greedy mode by default.  Which is the use of * +? {m,n}, match as many backwards as possible.

e.g.

Ab* can match a ab abbb ... Well, when B is plenty, it's going to match as much as possible.


In []: Re.findall (R ' ab* ', ' abbbbbbb ')

OUT[96]: [' abbbbbbb ']


Non-greedy mode: match the content of the compound regular condition as little as possible


Greedy mode---"non-greedy mode method: Back Add"? ”

That is *?  +??? {m,n}?


in [+]: Re.findall (R ' ab*? ', ' abbbbbbb ')

OUT[100]: [' a ']


In [101]: Re.findall (R ' ab+? ', ' abbbbbbb ')

OUT[101]: [' AB ']


In [102]: Re.findall (R ' ab?? ', ' abbbbbbb ')

OUT[102]: [' a ']


In [103]: Re.findall (R ' ab{2,4}? ', ' abbbbbbb ')

OUT[103]: [' ABB ']


Regular expression grouping


((AB) * (CD))

Regular expression (AB) *CD


1. Regular expressions can be grouped, the grouped flags are parentheses (), each parenthesis is a subgroup of the regular expression, and each subgroup is part of the overall regular expression and is also a small regular expression


2. When there are multiple subgroups, we call the first and second from the outer layer to the inside, respectively. Child groups. When at the same level, count from left to right separately

3. Group will the table * +? {} Repeat behavior, that is, each grouping as a whole, to do the corresponding repeat operation

4. When a subgroup can match multiple target string contents, only one content is returned


In [113]: Re.findall (R ' (AB) +cd ', ' ababcdef ')

OUT[113]: [' AB ']


5. Each group can be named, and we can identify each group according to its name.


Format: (? P<word>hello)

Give a name to the subgroup (hello), the name is "word"

Child groups are called by name (? P=word) represents the replication of a subgroup of regular expression content



In [123]: Re.findall (R ' (? P<word>hello) \s+ (? P=word)) ', ' Hello Hello ')

OUT[123]: [(' Hello hello ', ' hello ')]


Python regular some simple matches

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.