Python learning notes-Regular Expression entry, python learning notes
# This article does not explain the regular expression knowledge in detail, but only serves as the entry-level regular expression knowledge directory.
The power of Regular Expressions has been heard early. When I was a freshman, I took a selection test. The question was to use an HTML Parser. The advantage of regular expressions was superb. I will not talk much about the problem, but I will do it directly:
1. metacharacters:
There are also negative characters corresponding to it, most of which are lowercase letters to uppercase, for example, \ D indicates matching non-numeric characters.
2. Duplicate(Matching a variable-length string ):
Metacharacters can match a single character type. to match a string with an unknown length or a limited length, you must add a qualifier to the back.
3. Scope and group:
Sometimes metacharacters cannot meet the needs of matching characters. In this case, [] is required to define the matching range. For example, to match a word with only a, s, d, and f, you can use \ B [asdf] \ B. Parentheses in the regular expression are used to represent groups. For example, you can use groups to repeat several characters, for example (\ d. \ d) {1, 3} indicates matching the repeated content in parentheses one to three times. In addition, parentheses can also be used for back-reference.
4. Zero-width assertion:
The zero-width assertion is mainly used to find the content before or after certain assertions (specified content), such (? = Exp) match the content before the expression exp (not matching exp ),(? <= Exp) the content after the expression exp is matched, and the negative and zero-width assertions are not recorded here.
5. Greed and laziness:
When a regular expression contains a qualifier that can accept duplicates, the common behavior is (when the entire expression can be matched) matching.As many as possibleCharacter, that is, greedy match. For example, if expression a. * B is searched for aabab, it will match the entire string aabab. The lazy match matches as few strings as possible. You only need to add ?, For example, .*? B will match aab (first to third characters) and AB (fourth to fifth characters ).
6. Python re module:
Python provides the re module and all functions of regular expressions. Here are only two methods and one note.
6.1 match method:
match()Method to Determine whether to match. If the match is successful,MatchObject. OtherwiseNone.
1 >>> import re2 >>> re.match(r'^\d{3}\-\d{3,8}$', '010-12345')3 <_sre.SRE_Match object at 0x1026e18b8>4 >>> re.match(r'^\d{3}\-\d{3,8}$', '010 12345')5 >>>
If a group is defined in the regular expression, you canMatchUsegroup()Method to extract the substring. Notesgroup(0)Always the original string,group(1),group(2)...... Indicates 1st, 2 ,...... Substring. You can use the groups () method to obtain all substrings.
6.2 split method:
The split () method can split the molecular string as follows:
1 >>> re.split(r'[\s\,\;]+', 'a,b;; c d')2 ['a', 'b', 'c', 'd']
6.3 escape characters:
The python escape string is also represented by \. Therefore, when a regular expression is read as a string, the compiler automatically removes the Escape Character \. An error occurs when the regular expression is used, therefore, we recommend that you use the r prefix of python.
Ps:
This article summarizes the learning of regular expressions. Most of the knowledge comes from the Internet. Below are two links that I think are better:
1. http://www.jb51.net/tools/zhengze.html regular expression on the introduction of more detailed, I think it is better to learn regular expression or buy a book.
2. http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832260566c26442c671fa489ebc6fe85badda25cd000 from Liao Xuefeng great god tutorial, combined with python, just a little shorter.