Python---re module

Source: Internet
Author: User
Tags locale object object

1. The meta-character of the regular expression is:.  \ * + ? ^ $ {} [ ]

. Match any character other than line break

\ escape character, so that after a character changes the original meaning

* matches the preceding character 0 or more times

+ Match previous character 1 or more

? Matches a character 0 or more times

^ Match string start

$ match End of string

{} {m} matches the previous character m times, {m,n} matches the previous character M to n times, and if N is omitted, matches m to infinity

[] Character Set. The corresponding location can be any character in the character set. Characters in a character set can be listed individually, or they can be given a range, such as [ABC] or [A-c]. [^ABC] denotes inversion, that is, non-ABC.

All special characters lose their original special meaning in the character set and the special meaning of recovering special characters with \ Backslash escape

() The enclosed expression will be grouped, starting from the left side of the expression without encountering a grouped opening parenthesis "(", Number +1.
Grouping expressions as a whole can be followed by a number of words. The | In expression is only valid in this group.

Here we need to highlight the effect of the backslash:

Backslash followed by meta-character removal special function (will be escaped to ordinary characters of special words)

A backslash followed by a normal character to implement special functions (that is, predefined characters)

A string that matches a group of words that reference the ordinal

A=re.search (R'(Tina) (FEI) haha\2','Tinafeihahafei Tinafeihahatina ' ). Group ()print(a) Result: Tinafeihahafei

2. Predefined character sets:

\d: Represents the number 0-9

\d: Non-digital

\s: Matches any white space character [< space >\t\r\n\f\v]

\s: Non-empty characters

\w: Matches any character that includes an underscore [a-za-z0-9_]

\w: Matches non-alphabetic characters, i.e. matches special characters

\a: Match character start, same ^

\z: Matches the end of the character, same as $

\b: Matches a word boundary to match a word boundary, that is, the position between the word and the space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.

\b: the opposite of \b

Here we need to emphasize the understanding of \b Word boundaries: w= Re.findall ('\btina','Tian Tinaaaa')Print(W) s= Re.findall (r'\btina','Tian Tinaaaa')Print(s) v= Re.findall (r'\btina','tian#tinaaaa')Print(v) a= Re.findall (r'\btina\b','Tian#[email protected]')Print(a) the results of the implementation are as follows: []'Tina']['Tina']['Tina']

Three. function functions commonly used in re modules

1.compile ()

Compiles a regular expression pattern that returns the schema of an object. (You can compile common regular expressions into regular expression objects, which can be a bit more efficient.) )

Format:

Re.compile (pattern,flags=0)

Pattern: The expression string used at compile time.

Flags compile flags that modify the way regular expressions are matched, such as case sensitivity, multiline matching, and so on. The usual flags are:

Sign Meaning
Re. S (Dotall) make. Match all characters including line breaks
Re. I (IGNORECASE) Make the match case insensitive
Re. L (LOCALE) Do localization identification (locale-aware) matching, French, etc.
Re. M (MULTILINE) Multiline match, affecting ^ and $
Re. X (VERBOSE) The flag is easier to understand by giving a more flexible format to write regular expressions
Re. U Resolves characters based on the Unicode character set, which affects \w,\w,\b,\b

Import"Tina is a good girl, she's cool, clever, and so on ... "  = re.compile (R'\w*oo\w*')print(Rr.findall (TT))   # Find all words that contain ' oo ' The results of the execution are as follows: ['good'cool']

2. Match ()

Determines whether the re matches the position of the string at the beginning. Note: This method is not an exact match. If the string has any remaining characters at the end of the pattern, it is still considered successful. If you want an exact match, you can add the boundary match ' $ ' at the end of the expression

Format:

Re.match (Pattern, string, flags=0)

Print (Re.match ('com','comwww.runcomoob'). Group ()) Print (Re.match ('com','comwww.runcomoob', re. I). Group ()) The results of the execution are as follows: ComCom

3. Search ()

Format:

Re.search (Pattern, string, flags=0)

The Re.search function finds a pattern match within the string, as long as the first match is found and then returns, none if the string does not match

Print (Re.search ('\dcom','www.4comrunoob.5com'). Group ()) The execution results are as follows: 4com

* Note: match and search once matched successfully, is a match object object, and the match object object has the following methods:

    • Group () returns a string that is matched by RE
    • Start () returns the position where the match started
    • End () returns the position of the end of the match
    • Span () returns a tuple containing the position of the match (start, end)
    • Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.

A. Group () returns the whole string of re-matches,
B. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist
The C.groups () groups () method returns a tuple that contains all the group strings in a regular expression, from 1 to the included group number, usually groups () does not require parameters, and returns a tuple that is a tuple defined in a regular expression.

ImportRea="123abc456" Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (0))#123abc456, return to the whole Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (1))#123 Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (2))#ABC Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (3))#456## #group (1) List the first bracket matching section, Group (2) lists the second Bracket matching section, and Group (3) lists the third Bracket matching section. ###

4, FindAll ()

Re.findall traversal matches, you can get all the matching strings in the string and return a list.

Format:

Re.findall (Pattern, string, flags=0)

p = re.compile (R'\d+')print(P.findall ('o1n2m3k4 ' ) execution results are as follows: ['1'2'3  "'4']
ImportRett="Tina is a good girl, she's cool, clever, and so on ..."RR= Re.compile (r'\w*oo\w*')Print(Rr.findall (TT))Print(Re.findall (R'(\w) *oo (\w)', TT))#() represents a sub-expressionThe results of the implementation are as follows: ['Good','Cool'][('g','D'), ('C','L')]

5, Finditer ()

Searches for a string that returns an iterator that accesses each matching result (match object) sequentially. Find all the substrings that the RE matches and return them as an iterator.

Format:

Re.finditer (Pattern, string, flags=0)

ITER = Re.finditer (r'\d+','drumm44ers Drumming, 11 ... ...') forIinchITER:Print(i)Print(I.group ())Print(I.span ()) The results of the implementation are as follows:<_sre. Sre_match object; span= (0, 2), match=' A'>12(0,2)<_sre. Sre_match object; Span= (8, ten), match=' -'>44(8, 10)<_sre. Sre_match object; Span= (+), match=' One'>11(24, 26)<_sre. Sre_match object; Span= (+), match='Ten'>10(31, 33)

6. Split ()

Returns a list after splitting a string by a substring that can be matched.

You can use Re.split to split a string, such as: Re.split (R ' \s+ ', text), and divide the string into a word list by space.

Format:

Re.split (Pattern, string[, Maxsplit])

Maxsplit is used to specify the maximum number of splits and does not specify that all will be split

 print  (Re.split ( '  \d+  " ,  " one1two2three3four4five5  "    "  One   ", "   The   ", "   Three   ", "   Four   ", "   Five   ", " ] 
 print  (Re.split ( '  \d+  " ,  " one1two2three3four4five5  "    "  One   ", "   The   ", "   Three   ", "   Four   ", "   Five   ", " ] 

Python---re module

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.