There is a strange problem with practicing the RE module today, and the same regular expression is inconsistent with the results of re.search () matching with Re.compile (). FindAll ().
It is very strange, therefore, to record, so as to prevent a similar situation in the future can not solve.
1 #!/usr/bin/env Python32 #Author:taoke3 ImportRe4str ='<link rel= "icon" sizes= "any" mask href= "http://www.baidu.com/img/baidu_ 85beaf5496f291521eb75ba38eacbd87.svg.com ">'5Pat ='[A-za-z0-9]+://[a-za-z0-9]+\. [A-za-z0-9]+\. (COM|CN)'6p =Re.search (PAT,STR)7 Print(P)8p =Re.compile (PAT). FindAll (str)9 Print(Len (P), p)
Operation Result:
1 D:\Code\WebCrawler\venv\Scripts\python.exe d:/code/webcrawler/retest/retest1.py2 <_sre. Sre_match object; Span= (+), match='http://www.baidu.com'>3 1 ['com ' ]45 Process finished with exit code 0
Two times the results were not consistent. Then looked around for information to ask people, later found a blog: Python re module findall function usage Brief
Describes the use of Re.re.compile (). FindAll ().
>>>ImportRe>>> s ="adfad asdfasdf asdfas asdfawef ASD Adsfas">>> reObj1 = Re.compile ('((\w+) \s+\w+)') >>>Reobj1.findall (s) [('Adfad asdfasdf','Adfad'), ('Asdfas asdfawef','Asdfas'), ('ASD Adsfas','ASD')] >>> reObj2 = Re.compile ('(\w+) \s+\w+') >>>Reobj2.findall (s) ['Adfad','Asdfas','ASD'] >>> reObj3 = Re.compile ('\w+\s+\w+') >>>Reobj3.findall (s) ['Adfad asdfasdf','Asdfas asdfawef','ASD Adsfas']
Follow the code example above:
The FindAll function returns a list of all occurrences of a regular expression in a string, where the "results" in the list are presented, that is, the information contained in the list is returned in FindAll.
@1. When there are multiple parentheses in the given regular expression, the element of the list is the same as the number of strings in the Tuple,tuple, and the string content corresponds to the regular expression in parentheses, and the emission order is in parentheses.
@2. When a given regular expression has a parenthesis, the element of the list is a string, and the contents of the string correspond to the regular expression in parentheses (not the match for the entire regular expression).
@3. When there are no parentheses in the given regular expression, the element of the list is a string that matches the entire regular expression.
So put my regular expression above the following:
1 #!/usr/bin/env Python32 #Author:taoke3 ImportRe4str ='<link rel= "icon" sizes= "any" mask href= "http://www.baidu.com/img/baidu_ 85beaf5496f291521eb75ba38eacbd87.svg.com ">'5Pat ='[A-za-z0-9]+://[a-za-z0-9]+\. [A-ZA-Z0-9]+\.COM|CN'6p =Re.search (PAT,STR)7 Print(P)8p =Re.compile (PAT). FindAll (str)9 Print(Len (P), p)
The results of the operation are as follows:
1 D:\Code\WebCrawler\venv\Scripts\python.exe d:/code/webcrawler/retest/retest1.py2 <_sre. Sre_match object; span=, match='http://www.baidu.com'>3 1 ['/http www.baidu.com']45 Process finished with exit code 0
Two matches the result is the same, haha.
Python re module findall use