Import re
Regular Expressions:
Frequently used symbols: Dot question mark, asterisk, and parenthesis
.: matches any character except for line break \ n
--the DOT can be interpreted as a placeholder, and a dot number matches one character.
*: Matches the previous character 0 or unlimited times
?: matches the previous character 0 or 1 times
. *: Greedy algorithm (as many matches as possible to the data)
. *?: Non-greedy algorithm (find as many combinations as possible to meet the criteria)
(): The data in parentheses will be returned as a result.
Common methods: Findall,search,sub
FindAll: Match all the regular content
Search: Matches and presents the first conforming content, returning a regular expression object
Sub: Replace the content that matches the rule, return the replaced value
All-Purpose Expressions:
(.*?)
Matching multi-line expressions
Re. S
s= ' Sdfhajkdxxluhuanxx lsdhfxxwangpiaoxxsjdkf ' Sub=re,findall (' xx (. *?) xx ', S,re. S) The difference between//findall and search sub= re.search (' xx (. *?) Xxdsfaxx (. *?) xx ', S,re. S). Group (1) sub= Re.findall (' xx (. *?) Xxdsfaxx (. *?) xx ', S,re. S) Print sub[0][1]//There are multiple lines in the string that satisfy the matching rule.
Tips:
Match numbers
A=sdfasd123415ksadfj2345kdsafj
B=re.findall (' (\d+) ', a)
Matching principle:
FindAll and search match use
Catch the big first and catch the small
The so-called Python web crawler Basics