I recently learned that the RE module in Python has printed a regular expression, read the notes for a noon, and share them with you.
Plain text regular can match I flag forcibly case insensitive
C.T match cat CBT ... match any single character
.. Any of two characters
[AB] only matches AB
[A-za-z0-9]
Pick Color #[a-fa-f0-9][a-fa-f0-9][a-fa-f0-9][a-fa-f0-9][a-fa-f0-9][a-fa-f0-9]
Improved color Extraction #[a-fa-f0-9]{6}
[^0-9] Non-matching
[0-9] backslash escape matching square brackets
\d arbitrary number \d non-digital
\w equivalent to [a-za-z0-9_] \w non-W
Match mailbox: \[email protected]\w+.\w+ OK
Improved matching email: [A-za-z0-9]\[email protected]\w+.\w+[a-za-z] Ok
+ Match one or more
* Match 0 to multiple
? Match 0 or 1 of
http://[\w./]+ Matching website
\s any whitespace character
\x 16-8 binary
\b+ matches multiple consecutive characters
(? m) treat newline characters as delimiters to match each line of string
{5} repeated five times
{2,4}2-4 times
{2,} at least 2 times
Greedy type *, +, {123}
Lazy type *? + , {123}?
\b Word boundaries
\b?????????????????
^ String start
$ End of string
(\w+) \1 backtracking reference
Look forward. + (? =:) looking for: front stuff and not included:!!!!
? <= Backward Lookup
(".:/ /. ") Crawl Page Not validated
import requests
import re
r = requests.get(“http://www.baidu.com/s?wd=qqwe%40163.com&rsv_spt=1&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=baiduhome_pg&rsv_enter=1&inputT=5722&rsv_t=c848L7Xor4vFhEoVV9GPzZr2MuYMzFl1%2FETo9cY0rHjXNql5QbkRcKTrFd5hVllmdRaP&rsv_sug3=23&rsv_sug1=14&rsv_sug2=0&rsv_sug4=7428“)
Urlres=re.findall (R ' (http://.*?) "', R.text) Colorres=re.findall (R ' #[a-fa-f0-9][a-fa-f0-9][a-fa-f0-9][a-fa-f0-9][ A-fa-f0-9][a-fa-f0-9] ', R.text) Res=re.findall (R ' #[a-fa-f0-9]{6} ', R.text) Mailres=re.findall (R ' [A-zA-Z]\[email Protected]\w+.\w+[a-za-z] ', r.text) print (R.text) url2
Res=re.findall (R ' https?:/ /[\w./]+ ', R.text)
Domainres=re.findall (R ' http?:/ /[\w./]+/', R.text) Imgres=re.findall (R '
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Collation of regular expression notes