Python BASICS (13th) and python Basics
The re module contains regular expressions. This chapter introduces the main features and regular expressions of the re module.
What is a regular expression?
A regular expression is a pattern that can match a text clip. The simplest regular expression is a normal string that can match itself. In other words, the regular expression 'python' can match the string 'python '. You can use this matching behavior to search for the mode in the text, and use a value concurrency-specific mode after calculation, or segment the text.
** Wildcard
Regular Expressions can match more than one string. You can use some special characters to create this pattern. For example, the period (.) can match any character. Use the question mark (?) for window search (?) Match any character for the same purpose. Such symbols are called wildcards.
** Escape special characters
Through the above method, if we want to match "python.org", can we directly use 'python. org? This can be done, but it will also match "pythonzorg", which is not the expected result.
Okay! We need to escape it. We can add a forward slash to it. Therefore, in this example, you can use "python \. org" to match "python.org.
** Character Set
We can use brackets ([]) to enclose strings to create character sets. You can use a range. For example, '[a-z]' can match any character from a to z. You can also combine the range in one way, for example, '[a-zA-Z0-9]' can match any uppercase/lowercase letters and numbers.
Character Set inversion. ^ characters can be used at the beginning. For example, '[^ abc]' can match any character except a, B, and c.
** Selector
Sometimes you only want to match the strings 'python' and 'perl '. You can use the special character of the selected item: Pipeline symbol (| ). Therefore, the required mode can be written as 'python | perl '.
** Submode
However, sometimes you do not need to use the selector for the entire mode-it is only part of the mode. In this case, you can use parentheses to start the desired part or the sub-mode. The delimiter can be written as 'p (ython | erl )'
** Optional
After a question mark is added to the sub-mode, it becomes optional. It may appear in a matching string, but it is not required.
R' (heep ://)? (Www \.)? Python \. org'
Only the following characters can be matched:
'Http: // www.python.org'
'Http: // python.org'
'Www .python.org'
'Python. org'
** Replay Mode
(Pattern) *: The pattern can be repeated 0 times or multiple times.
(Pattern) +: The pattern can be repeated once or multiple times.
(Pattern) {m, n}: allowed mode repetition m ~ N times
For example:
R 'W * \. python \. org 'matches 'www .python.org', '.python.org ', and 'wwwwwww .python.org'
R 'W + \. python \. org 'matches 'W .python.org', but cannot match '.python.org'
R 'W {3, 4} \. python \. org 'can only match 'www .python.org' and 'wwww .python.org'
Re Module Content
Some important functions in the re module:
Re. compile converts a regular expression to a pattern object to achieve more efficient matching.
Re. search searches for the first child string matching the regular expression in the given string. Find the function and return MatchObject (value: True); otherwise, return None (value: False ). Because of the nature of the return value, this function can be used in conditional statements:
If re. serch (pat, string ):
Print 'found it! '
Re. math will match the regular expression at the beginning of the given string. Therefore, re. math ('P', 'python') returns true, and re. math ('P', 'www. python') returns false.
Re. split splits the string based on the match of the pattern.
>>> import re>>> some_text = 'alpha , beta ,,,gamma delta '>>> re.split('[,]+',some_text)['alpha ', ' beta ', 'gamma delta ']
Re. findall returns all matching items in the given mode in the form of a list. For example, to search for all words in a string, do the following:
>>> import re>>> pat = '[a-zA-Z]+'>>> text = '"Hm...err -- are you sure?" he said, sounding insecure.'>>> re.findall(pat,text)['Hm', 'err', 'are', 'you', 'sure', 'he', 'said', 'sounding', 'insecure']
Re. sub is used to replace the substring (leftmost and overlapped substring) of the matching mode with the given content.
>>> import re>>> pat = '{name}'>>> text = 'Dear {name}...'>>> re.sub(pat, 'Mr. Gumby',text)'Dear Mr. Gumby...'
Re. escape function, which can be used to escape all characters in the string that may be interpreted as regular operators.
If the string is long and contains many special characters and you do not want to input a large number of backslash characters, you can use this function:
>>> re.escape('www.python.org')'www\\.python\\.org'>>> re.escape('but where is the ambiguity?')'but\\ where\\ is\\ the\\ ambiguity\\?'
Matching objects and Groups
In short, a group is a sub-module placed in parentheses. The serial number of A group depends on the number of parentheses on the left. Group 0 is the entire module, so in the following mode:
'There (was a (wee) (cooper) who (lived in Fyfe )'
Including groups:
0 There was a wee cooper who lived in Fyfe
1 was a wee cooper
2 wee
3 cooper
4 lived in Fyfe
An important method for re matching objects
See the example below:
>>> import re>>> m = re.match(r'www\.(.*)\..{3}','www.python.org')>>> m.group()'www.python.org'>>> m.group(0)'www.python.org'>>> m.group(1)'python'>>> m.start(1)4>>> m.end(1)10>>> m.span(1)(4, 10)
The string that matches the given group in the return mode of the group method. If there is no group number, the default value is 0. group () = m. group (0); if a group number is specified, a single string is returned.
The start method returns the start index of the given group match,
The end method returns the end index of the given group match plus 1;
Span returns the index of the group's start and end positions in the form of a tuples (start, end.
----------------------------
Regular expressions should not be easy to understand. Although I have not learned well, I have a general impression. What follows will be very interesting: Reading files, writing graphical windows, connecting to databases, and web programming ....