How to solve the "Vertigo reaction" produced by Python's Re module group, groups and FindAll when they meet the Group "()" in regular expressions

Source: Internet
Author: User
Tags regular expression book repetition

Reprint Please specify source: https://www.cnblogs.com/oceanicstar/p/9244783.html go straight to the first example
>>> Re.search ('(book+)','Mebookbookme'). Groups () (' Book',)>>> Re.search ('(book+)','Mebookbookme'). Group ()' Book'>>> Re.search ('(book) +','Mebookbookme'). Groups () (' Book',)>>> Re.search ('(book) +','Mebookbookme'). Group ()'BookBook'>>> Re.findall ('(book) +','Mebookbookme')[' Book']>>> Re.findall ('(book+)','Mebookbookme')[' Book',' Book']
is it all dizzy? at this point, you need to have some theory?? Back to the blood: 1. The first thing to understand:The difference between search, match, and FindAll on the number of matches on a regular expression:Search and match (match from scratch) are the only matches in the string that satisfy the regular expression pattern for the first timeFindAll matches all content in a string that satisfies the regular expression pattern. 2. The second thing to understand:(1) Group and groups are two different functions that are used with search and match matching objects, so only the content that matches the regular expression pattern for the first time in a string is matched. (2) Group and groups are used because in regular expressions we use parentheses () to group cells, to repeat content matching (after parentheses with a + sign), or to specify content (with group number or groups ()). (3) The usage difference between group and groups is as follows:  "M.group ()" (here m is the object generated after search or match)M.group () can not fill in parentheses, or you can pass in a number n, which is M.group (n). The following sub-conditions are described:  < do not pass in N or incoming n=0>
M.group () = = M.group (0) = = Display matches the entire contents of the regular expression pattern (all matching characters) for the first time
   regardless of the parentheses, this is the API's rule, such as the ' (book) + ' regular expression can show ' BookBook ' to the ' Yourbookbook ' match.   < incoming n>0> returns the character of the nth set of parentheses. (There are several () in a regular expression with several groupings)but M.group (1), M.group (2) 、... Subsequent groupings will only show () matches to the contents of the grouping brackets, for example ' (book) + ' regular expression with M.group (1) to the ' Yourbookbook ' match will only show a ' book ' (because there are only 1 parentheses, only 1 groups).   "m.groups ()"m.groups () returns all parentheses matching characters (a few parentheses in the regular expression will have several grouped string contents displayed), returned as a container in the tuple format.
M.groups () = = (M.group (1), M.group (2), ...)
What, still can't understand? What does it look like? Can't stop, analyze it individually:
    1. First, the analysis of the regular expression book+ and (book+)
search for the ' mebookbookme ' string, matching a regular expression such as book+ or (book+) to the whole content is ' book '(because the + number is only the K-letter to match the repetition, do not confuse)
 >>> re.search ( " book+   ", "  mebookbookme   " )  <_sre. Sre_match object; Span= (2, 6), Match= " book  "  >>>> Re.search ( "  (book+)   ", "  mebookbookme   " )  <_sre. Sre_match object; Span= (2, 6), Match= " book  " ; 
so whether groups () or group (), are ' book ' (A book) 
    1. and search for (book) + match to the entire content is ' BookBook '
>>> re.search ('(book) +'mebookbookme') <_sre. Sre_match object; Span= (2, ten), Match='bookbook'>
using group () is to show the contents of the regular match, that is, return ' BookBook '
>>> re.search ('(book) +'mebookbookme'). Group ()'bookbook'
   The use of Group (1) is to show the first (and certainly only one) parenthesis in parentheses, which returns ' book 'using groups () is to wrap all the small enclosed content in a tuple and return (of course there is only one parenthesis), i.e. return (' book ',)
>>> re.search ('(book) +'mebookbookme'). Groups ()('book',)

using the example of FindAll with + in matching grouping parentheses (), similar to group (1), groups (), is that the repetition of () after + is not displayed, but only once in the parentheses, in other words, there are several parentheses in the regular expression (). Shows the matches within a few brackets ().
 >>> Re.findall ( "  (book) +  Span style= "COLOR: #800000" > ", "   Mebookbookme   " ) [ "  book   " ]  >>> Re.findall ( "  (book+)   ", "  mebookbookme   " ) [  '   ] 
    1. The first findall to the (book) + display, directly merge the + number repeating group into a unit to show (this is the same as the search groups for (book) + display)
    2. The second findall (book+) shows that all the groupings are displayed.
a little bit more??, add an example to illustrate FindAll:The following example, no matter how many times the string ' ab ' and ' C ' repeats, will only show a ' ab ' and a ' C ', of course, the reason for returning two tuples is because FindAll
>>> re.findall ('(AB) + (c) +'abcc123ababcccc') [('ab'c'), ('ab'  'C')]

What if we are going to match ' BookBook ' to ' mebookbookme '? 1. Does the first analysis use match, search or findall? ' BookBook ' in ' Mebookbookme ' does not appear at the beginning of the first letter, so do not use match;The ' BookBook ' mode appears only once (and, of course, the ' BookBook ' mode for the first time), so it can be matched with search;Of course FindAll can match all the appearing ' BookBook ' patterns, which is sure to work.   2. Specific analysis:(1) Use Searchusing the ' (book+) ' approach to search, the complete content of the match is ' BookBook ', with group () or group (0) returning to the full content.
>>> re.search ('(book) +'mebookbookme'). Group ()'bookbook'>>> re.search ('(book) +' ) ' Mebookbookme ' ). Group (0)'bookbook'

If you want to use Group (1) or groups () [0], what is the regular expression? You can use the "(?: book) +) ' Non-capturing group (non-numbered group) in the form of (?: expression), the parentheses are not entered into the grouping number. The reason to add a parenthesis outside is because once we have used group (1) or groups () [0], we have to group the parentheses with a number of 1. (PS: Of course This example is too cumbersome to do so, certainly not in this way, just to illustrate the usage)
>>> re.search ('(?: book) +)'mebookbookme'). Group (1) ' BookBook '>>> re.search ('(?: book) +)'mebookbookme  '). Groups () [0]'bookbook'

(2) using FindAlluse FindAll If you are grouping with parentheses, you cannot use the ' (book) + ' regular expression, but you can use a non-capturing group (not a numbered group) because it also returns only the contents of the parentheses.
>>> re.findall ('(?: book) +'mebookbookme') [  'bookbook'>>> re.findall ('(?: book) + ' ' Mebookbookme ' ) [0]'bookbook'

Reprint Please specify source: https://www.cnblogs.com/oceanicstar/p/9244783.html

How to solve the "Vertigo reaction" produced by Python's Re module group, groups and FindAll when they meet the Group "()" in regular expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.