Preface
Before someone asked for a demand, I looked at the need to use regular expression is the most appropriate. Considering the previous use of regular expressions, it was cramming, so this time I finished the task while the system learned the regular expression. The main reference is a video on PyCon2016 regular Expressions.
I'll summarize the regular expressions in a few articles.
Here is the first part, the basics:
Basic part
The most basic usage of regular expressions is summed up here, most of which are often used by me (and most programmers), so I've just taken a piece of it, just to illustrate some of them with examples.
. All characters except line break
^ Beginning of the line
$ End of line
[ABCD] ABCD one of the characters
[^ABCD] Any character other than ABCD
[a-d] equivalent to [ABCD]
[A-dz] equivalent to [ABCDZ]
\b Word boundaries
\w alphanumeric or underscore equivalent to [a-za-z0-9_]
\w is the opposite of \w.
\d number, equivalent to [0-9]
\d is the opposite of \d.
\s white space character, equivalent to [\t\n\r\f\v]
\s is the opposite of \s.
{5} exactly 5 times before the Regular Expression section (hereinafter)
{2,5} ~ appears 2 to 5 times
{2,} ~ appears 2 or more times
{, 5} ~ appears 0 to 5 times
* ~ appears 0 or more times
? ~ occurs 0 or 1 times
+ ~ appears 1 or more times
abc| DEF matches ABC or def
\ escape character, such as \ means match *,\$ indicates match $*
\b, \ Use the following examples to illustrate briefly:
\b:
>>> Re.search (R ' \bhello\b ', ' hello ') <_sre. Sre_match object; Span= (0, 5), match= ' Hello ' >>>> re.search (R ' \bhello\b ', ' Hello World ') <_sre. Sre_match object; Span= (0, 5), match= ' Hello ' >>>> re.search (R ' \bhello\b ', ' Hello,world ') <_sre. Sre_match object; Span= (0, 5), match= ' Hello ' >>>> re.search (R ' \bhello\b ', ' Hello_world ') >>>
In fact here, \b and \w, but \b can match the beginning of the line end of the non-display class characters, and \w not.
\:
>>> Re.search (R ' \$100 ', ' $ $ ') <_sre. Sre_match object; Span= (0, 4), match= ' $ ' >>>> re.search (R ' $ ', ' $ $ ') >>>
To match characters that have special meanings in regular expressions, such as $, ^, *, and so on, you need to escape with \.
Raw string:
Also, in the preceding example, the pattern string is preceded by an R, which means raw string, followed by a string, which the Pyhton interpreter does not need to escape. Because, \ in the Python string and regular expressions have special meaning, so if it is not raw string, then to express a \ character, it takes four \ (first escaped in the Python interpreter, 2 \ means 1 \, leaving 2 \, escaped in the regular expression again, end up with a \). For example:
>>> Re.search (R ' \bhello\b ', ' hello ') <_sre. Sre_match object; Span= (0, 5), match= ' Hello ' >>>> re.search (' \bhello\b ', ' hello ') >>> re.search (' \\bhello\\b ', ' Hello ') <_sre. Sre_match object; Span= (0, 5), match= ' Hello ' >>>> re.search (' \\\\hello\\\\ ', ' \\hello\\ ') <_sre. Sre_match object; span= (0, 7), match= ' \\hello\\ ' >>>> re.search (R ' \\hello\\ ', ' \\hello\\ ') <_sre. Sre_match object; span= (0, 7), match= ' \\hello\\ ' >>>> print (' \\hello\\ ') \hello\