Python re module findall use

Source: Internet
Author: User

There is a strange problem with practicing the RE module today, and the same regular expression is inconsistent with the results of re.search () matching with Re.compile (). FindAll ().

It is very strange, therefore, to record, so as to prevent a similar situation in the future can not solve.

1 #!/usr/bin/env Python32 #Author:taoke3 ImportRe4str ='<link rel= "icon" sizes= "any" mask href= "http://www.baidu.com/img/baidu_ 85beaf5496f291521eb75ba38eacbd87.svg.com ">'5Pat ='[A-za-z0-9]+://[a-za-z0-9]+\. [A-za-z0-9]+\. (COM|CN)'6p =Re.search (PAT,STR)7 Print(P)8p =Re.compile (PAT). FindAll (str)9 Print(Len (P), p)

Operation Result:

1 D:\Code\WebCrawler\venv\Scripts\python.exe d:/code/webcrawler/retest/retest1.py2 <_sre. Sre_match object; Span= (+), match='http://www.baidu.com'>3 1 ['com ' ]45 Process finished with exit code 0

Two times the results were not consistent. Then looked around for information to ask people, later found a blog: Python re module findall function usage Brief

Describes the use of Re.re.compile (). FindAll ().

>>>ImportRe>>> s ="adfad asdfasdf asdfas asdfawef ASD Adsfas">>> reObj1 = Re.compile ('((\w+) \s+\w+)')  >>>Reobj1.findall (s) [('Adfad asdfasdf','Adfad'), ('Asdfas asdfawef','Asdfas'), ('ASD Adsfas','ASD')]    >>> reObj2 = Re.compile ('(\w+) \s+\w+')  >>>Reobj2.findall (s) ['Adfad','Asdfas','ASD']    >>> reObj3 = Re.compile ('\w+\s+\w+')  >>>Reobj3.findall (s) ['Adfad asdfasdf','Asdfas asdfawef','ASD Adsfas']

Follow the code example above:

The FindAll function returns a list of all occurrences of a regular expression in a string, where the "results" in the list are presented, that is, the information contained in the list is returned in FindAll.

@1. When there are multiple parentheses in the given regular expression, the element of the list is the same as the number of strings in the Tuple,tuple, and the string content corresponds to the regular expression in parentheses, and the emission order is in parentheses.

@2. When a given regular expression has a parenthesis, the element of the list is a string, and the contents of the string correspond to the regular expression in parentheses (not the match for the entire regular expression).

@3. When there are no parentheses in the given regular expression, the element of the list is a string that matches the entire regular expression.

So put my regular expression above the following:

1 #!/usr/bin/env Python32 #Author:taoke3 ImportRe4str ='<link rel= "icon" sizes= "any" mask href= "http://www.baidu.com/img/baidu_ 85beaf5496f291521eb75ba38eacbd87.svg.com ">'5Pat ='[A-za-z0-9]+://[a-za-z0-9]+\. [A-ZA-Z0-9]+\.COM|CN'6p =Re.search (PAT,STR)7 Print(P)8p =Re.compile (PAT). FindAll (str)9 Print(Len (P), p)

The results of the operation are as follows:

1 D:\Code\WebCrawler\venv\Scripts\python.exe d:/code/webcrawler/retest/retest1.py2 <_sre. Sre_match object; span=, match='http://www.baidu.com'>3 1 ['/http www.baidu.com']45 Process finished with exit code 0

Two matches the result is the same, haha.

Python re module findall use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.