(– This is a summary of the process of personal learning and use, if there is a mistake welcome)
The Python regular module re findall and Finditer are similar, but there is a big difference.
Both can get all the matching results, which is very different from the search method, while the difference is a return list, a matchobject type of iterator
Let's say we have data in which numbers represent phone numbers, XX represents the type of mailbox
Content = "' email:12345678@163.com
email:2345678@163.com
email:345678@163.com
'"
Requirements: (no group) to extract all mailbox information
Result_finditer = Re.finditer (r "\d+@\w+.com", content)
#由于返回的为MatchObject的iterator, So we need to iterate through the Matchobject method to output the for
i in Result_finditer:
print i.group ()
Result_findall = Re.findall (r) \d+@ \w+.com ", content)
#返回一个 [] direct output or loop output
print Result_findall for
i result_findall:
print I
Requirements: (regular group) to extract all phone numbers and mailbox types
Result_finditer = Re.finditer (\d+) @ (\w+). com, content)
#正则有两个分组, we need to get partitions separately, grouping starts with 0, the group method does not pass the index by default is 0, Represents the entire regular matching result for
i in Result_finditer:
phone_no = I.group (1)
Email_type = I.group (2)
Result_findall = Re.findall (\d+) @ (\w+). com ", content"
#此时返回的虽然为 [], but not simply [], but a tuple type list
#如: [(' 12345678 ', ' 163 '), (' 2345678 ', ' 163 '), (' 345678 ', ' 163 ')] for
i in Result_findall:
phone_no = i[0]
email_type = i[1]
The same is the case for named and unnamed groupings.
FindAll Note points:
1. When there is no grouping is returned is a regular match
Re.findall (R "\d+@\w+.com", content)
[' 2345678@163.com ', ' 2345678@163.com ', ' 345678@163.com ']
2. One group returns a match for a group rather than an entire regular match.
Re.findall (R "(\d+) @\w+.com", content)
[' 2345678 ', ' 2345678 ', ' 345678 ']
3. Pack groups into tuple when multiple groups are returned
Re.findall (R "(\d+) @ (\w+). com", content)
[(' 2345678 ', ' 163 '), (' 2345678 ', ' 163 '), (' 345678 ', ' 163 ')]
So if we need to get the whole regular and every group match, using findall we need to use the whole positive as a grouping
Re.findall ((\d+) @ (\w+). com) [content]
[(' 2345678@163.com ', ' 2345678 ', ' 163 '), (' 2345678@163.com ', ' 2345678 ') , ' 163 '), (' 345678@163.com ', ' 345678 ', ' 163 ')]
and using Finditer we don't have to manually enclose the entire regular () group () to represent the entire regular match
In practice, we can choose the method according to our requirement.