python-day6-Regular Expressions

Source: Internet
Author: User
Tags object object alphanumeric characters


In essence, a regular expression (or RE) is a small, highly specialized programming language,
(in Python) it is embedded in Python and is implemented through the RE module. The regular expression pattern is
Compiled into a sequence of bytecode, which is then executed by a matching engine written in C.

Binary characters

.   ^   $   *   +   ?   { }   [ ]   | ( ) \

+           match + number 1 times to unlimited
? 0 to 1 times before matching the number.
{m} Match the previous content m times
{M,n} Match the preceding content m to n times
*?,+?,??, {m,n}? In front of the *,+, and so on are greedy matches, that is, match as much as possible, after adding the number to make it an inert match

From the previous description can see ' * ', ' + ' and ' * ' are greedy, but this may not be what we say,
So, you can add a question mark later, change the strategy to non-greedy, just match as few re as possible. Example

1  Print(Re.findall (R"A (\d+?)","a23b"))#non-greedy mode2>>>['2']3 Print(Re.findall (R"A (\d+)","a23b"))4>>> [' at']

A backslash followed by a meta-character to remove special functions,
A backslash followed by a normal character for special functions.
A string that matches a group of words that reference the ordinal

 1  #   2 (r "  Alex" (Eric) com\2  ,  " alexericcomeric  "  )  3  print   (N.span ())  4  >>> (0) 

\d matches any decimal number; it is equivalent to class [0-9].
\d matches any non-numeric character; it is equivalent to class [^0-9].
\s matches any whitespace character; it is equivalent to class [\t\n\r\f\v].
\s matches any non-whitespace character; it is equivalent to class [^ \t\n\r\f\v].
\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].
\w matches any non-alphanumeric character; it is equivalent to a class [^a-za-z0-9_]
\b: Matches a word boundary, that is, the position between a word and a space.
Match word boundaries (including start and end), where "words" refer to consecutive letters, numbers, and
A string consisting of underscores. Note that the definition of \b is the junction of \w and \w,
This is a 0-wide qualifier (Zero-width assertions) that matches only the first and final words of a word.
A word is defined as a sequence of alphanumeric characters, so the ending is marked with a blank or non-alphanumeric character.
The following.

1 Print(Re.findall (R"abc\b","dzx &abc SDSADASABCASDSADASDABCASDSA"))2>>>['ABC']3 Print(Re.findall (R"\babc\b","dzx &abc SDSADASABCASDSADASDABCASDSA"))4>>>['ABC']5 Print(Re.findall (R"\babc\b","dzx SABC SDSADASABCASDSADASDABCASDSA"))6>>>[]

Re.match (' com ', ' Comwww.runcomoob ') (' \dcom ', ' www.4comrunoob.5com ')
Once the match succeeds, it is a match object object, and the match object object has the following methods:
Group () returns a string that is matched by RE
Start () returns the position where the match started
End () returns the position of the end of the match
Span () returns a tuple containing the position of the match (start, end)
Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.

1. Group () returns the whole string of re-matches,
2. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist
The 3.groups () groups () method returns a tuple containing all the group strings in the regular expression, from 1 to
The included group number, usually groups () does not require parameters, returns a tuple, and the tuple is a regular
A group defined in an expression.

1 ImportRe2A ="123abc456" ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (0)#123abc456, return to the ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (1) 5>>>[123] ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (2) 7>>>[ABC] ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (3)9>>>[456]

Group (1) lists the first bracket matching section, Group (2) lists the second bracket matching part, Group (3)
Lists the third bracket matching section.

#Re.findall returns all matching strings as a list#Re.findall can get all the matching strings in the string. such as:p = Re.compile (r'\d+')Print(P.findall ('ONE1TWO2THREE3FOUR4'))>>>['1','2','3','4']#gets all non-character words in the string that contain ' oo 'Text ='Djks#dooljsdj ('Print(Re.findall (R'\w*oo\w*', text))>>>['DOOLJSDJ']
#split a string with a numberp = Re.compile (r'\d+')Print(P.split ('ONE1TWO2THREE3FOUR4'))>>>[' One',' Both','three',' Four',"']#equivalent toPrint(Re.split ('\d+','ONE1TWO2THREE3FOUR4'))>>>[' One',' Both','three',' Four',"']
1A ='abc123abv23456'2b = Re.findall (r'(a)?', a)3 Print(b)4>>>['a',"']5 #Match 23 followed by any one character6b = Re.findall (r'(?: a)?', a)7 Print(b)8>>>['23a',' at']

python-day6-Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.