# This article does not explain the regular knowledge in detail, only as the entry-level regular knowledge directory.
The strong expression of the early heard, the big time to participate in a single exam, the topic is to be used as an HTML parser, the advantages of the regular performance is incisively and vividly. Not much to say, directly on the dry:
1. Meta-characters:
There are also inverse characters, mostly lowercase letters, such as \d that match non-numeric characters.
2. Repeat (that is, match the variable length string):
Metacharacters can match a single character type, and if you want to match an unknown length or a length-qualified string, you need to add a qualifier behind it.
3. Scope and Grouping:
Sometimes metacharacters do not meet the needs of matching characters, it is necessary to [] to delimit the matching range, for example, to match a word with only a,s,d,f, you can use \b[asdf]\b; there are parentheses in the regular box to represent the grouping, for example, if you want to repeat a certain number of characters, you can use grouping, such as (\d. \d) {1,3} indicates that the duplicate content in the matching parentheses is one to three times. In addition, parentheses can also be used for back-referencing, etc., which are not remembered here.
4.0 Wide Assertion:
0 wide assertions are primarily used to find content before or after certain assertions (the specified content), such as (? =exp) matches the content of the exp preceding (mismatch exp), and(? <=exp) matches the expression that appears after exp, There are also negative 0-wide assertions that are not remembered here.
5. Greed and laziness:
When a regular expression contains a qualifier that can accept duplicates, the usual behavior is to match as many characters as possible , that is, greedy matches, as long as the entire expression can be matched. If the expression a.*b, search Aabab, it will match the entire string Aabab. And the lazy match matches as few strings as possible, just add one after the qualifier? , for example a.*?b will match AaB (first to third character) and AB (fourth to fifth characters).
6. Python's Re module:
Python provides a re module that provides all the functionality of the regular. Here are only two methods and a place to note.
6.1 Match Method:
match()
The method determines whether the match is successful, returns an object if the match succeeds, Match
or returns None
.
1 >>> import re 2 >>> re.match (r " Span style= "color: #800000;" >^\d{3}\-\d{3,8}$ , " 010-12345 ) 3 <_sre. Sre_match object at 0x1026e18b8>4 >>> re.match (r '
If a group is defined in a regular expression, you can extract the substring from the Match
object using a group()
method. Notice that group(0)
it is always the original string, group(1)
group(2)
... Represents the 1th, 2 、...... Substring. All substrings can also be obtained using the groups () method.
6.2 Split Method:
The split () method can be used to cut a molecular string, as follows:
1 >>> re.split (R'[\s\,\;] +'b;; c D')2 ['a' 'b'c'd' ]
6.3 About escape characters:
The Python escape string is also represented by \, so the regular expression reads as a string when it is automatically removed by the compiler, the transfer character's \, the regular use of the error, it is recommended to use the Python R prefix.
Ps:
This article is a summary of the regular expression of learning, most of the knowledge from the Internet. Here are two links that people think are better:
1.http://www.jb51.net/tools/zhengze.html the introduction of regular expressions in detail, personally think to learn more about the regular expression or buy a book is better.
2.http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/ 001386832260566c26442c671fa489ebc6fe85badda25cd000 from the great god of the Liao Xuefeng, in conjunction with Python, just a little bit shorter.
Python Learning notes-Getting started with regular expressions