This article mainly introduces the python regular expression learning summary, which is of great reference value, for more information about how to implement regular expressions in Python, you can use the re (regular expression abbreviation) module, you can call various methods of the re module to implement different functions. Next let's talk about the methods that can be called through the re module in Python, and what the functions of these methods are; there are examples of regular expressions and meanings of various special symbols:
1. re. sub and replace:
The full spelling of sub is substitute, which means Replacement. since we know it is a replacement, it is easy to use it in the instance. In fact, replace also means replacement, however, their usage is not the same. here is an example to illustrate their similarities and differences:
>>> import re>>> str1 = 'Hello 111 is 222'>>> str2 = str1.replace('111','222')>>> print(str2)Hello 222 is 222>>>
This is a simple example. If all the numbers are changed to 222 in the following case, it is troublesome to use replace, using the sub method of the re module is relatively simple: (if it is a more complex operation, you may not be able to use replace .)
>>> import re>>> str1 = 'Hello 123 is 456'>>> str2 = re.sub('\d+','222',str1)>>> print(str2)Hello 222 is 222>>>
2. re. search () and re. match ():
Match: match the regular expression only from the beginning of the string. if the match succeeds, matchobject is returned. otherwise, none is returned.
Search: Match all strings in the string with the regular expression. if none of the strings match successfully, none is returned. otherwise, matchobject is returned.
The following example illustrates the similarities and differences between match and search. It also shows that in actual applications, search is used more often:
Import restr = 'helloword, I am alex 'if not re. match ('word', str): print ('cannot match') print (re. match ('hello', str1 ). group () print (re. search ('word', str1 ). group () # display the result cannot matchhelloword
3. re. split:
In Python, to split a string, you only need to call the split method of str. However, this split can only be performed based on a specific character, if you want to specify multiple characters for segmentation at the same time, it cannot be implemented.
Fortunately, the re module also provides the split method to split strings. In addition, this method is more powerful and can be split based on multiple characters at the same time, let's take a look at the differences between str split and re split:
Str1 = 'helloword, I; am \ nalex 'str2 = str1.split (',') print (str2) import restr3 = re. split ('[, |; | \ n]', str1) print (str3) # different output results ['helloword', 'I; am \ nalex '] ['helloword',' I ', 'AM', 'Alex']
We can see that the above is true.
4. findall:
The findall methods are basically the same as the compile method. Their usage is:
First, compile converts a regular expression string to a pattern instance, and then uses the patte instance to call the findall method to generate a match object to obtain the results. before the instances that combine them, let's first look at the meanings of the special characters preset in the regular expression:
\ D matches any decimal number; it is equivalent to the class [0-9].
\ D matches any non-numeric character; it is equivalent to the class [^ 0-9].
\ S matches any blank character; it is equivalent to the class ["t" n "r" f "v].
\ S matches any non-blank character; it is equivalent to the class [^ "t" n "r" f "v].
\ W matches any alphanumeric character; it is equivalent to the class [a-zA-Z0-9 _].
\ W matches any non-alphanumeric character; it is equivalent to the class [^ a-zA-Z0-9 _].
After reading the meanings of these special characters, let's take another example to illustrate the above arguments:
Import restr1 = 'asdf12dve4gb4 'pattern1 = re. compile ('\ d') pattern2 = re. compile ('[0-9]') mch2 = pattern1.findall (str1) mch2 = pattern2.findall (str1) print ('mquota: \ t % s' % mquota) print ('mch2: \ t % s' % mch2) # output result malign: ['1', '2', '4', '4'] 13 mch2: ['1 ', '2', '4', '4']
The above two instances can elaborate the above arguments, and also show that the special character \ d is indeed the same as [0-9]. the output result shows that, if you do not want to split every number into an element in the list, but want to output 12 as a whole, then you can do this: (add a + number after \ d. here, the + number indicates the overall output of one or more connected decimal numbers)
Import restr1 = 'asdf12dve4gb4 'pattern1 = re. compile ('\ d +') pattern2 = re. compile ('[0-9]') mch2 = pattern1.findall (str1) mch2 = pattern2.findall (str1) print ('mquota: \ t % s' % mquota) print ('mch2: \ t % s '% mch2) # output result malign: ['12', '4', '4'] mch2: ['1', '2 ', '4', '4']
Let's take another small example. This example uses the sub function of special characters and re to remove all spaces in the string:
Import restr1 = 'asd \ tf12d vdve4gb4 'new _ str = re. sub (' \ s * ', '', str) print (new_str) # output result asdf12dvdve4gb4
5. metacharacters:
We usually call binary characters:. ^ $ * +? {} [] | ()\
The metacharacters we first examine are "[" and "]". They are often used to specify a character category. The character category is a character set that you want to match. A single character can be listed, or two given characters separated "-"
Character to indicate a character range. For example, [abc] matches any character in "a", "B", or "c". It can also use [a-c] to represent the same character set, the effect is consistent with that of the former. If you only want to match lower-case letters, RE should be written as [a-z]. metacharacters do not work in the category. For example, [akm $] matches the character "a", "k", "m", or any one of "$"; "$" is usually used as a metacharacter, however, in the character category, its features are removed and restored to common characters.
.
[]: Metacharacters [] indicate character classes. in a character class, only characters ^,-,], and \ have special meanings. Character \ still represents escape, character-can define the character range, character ^ is placed in front, indicating not. (in this special character example, there is also a withdrawal ),
+ Match + content before the number 1 to unlimited times
? Matching? Content before No. 0 to 1
{M} matches the previous content m times
{M, n} matches the previous content m to n times
Here is a small example to illustrate the usage of the above characters in metacharacters []: (in the following example, note the following two points: first, after \ d +? The second is to add a character r before the matching. In this example, the same result can be displayed if the addition or not are used)
>>> import re>>> print(re.findall(r"a(\d+?)","a123b"))['1']>>> print(re.findall(r"a(\d+)","a123b"))['123']>>>
The above is a summary of python regular expression learning. I hope it will help you. if you have any questions, please leave a message and I will reply to you in a timely manner. I would like to thank you for your support for PHP chinnet!
For more articles about python regular expression learning, refer to PHP Chinese network!