Python Regular Expression Learning summary

Source: Internet
Author: User
Tags character classes
The way to implement a regular in Python is through the RE (regular expression abbreviation) module, you can invoke the various methods of the RE module to implement different functions, below we say, in Python through the RE module can call those methods, And the role of these methods, as well as the regular examples and the meanings of various special symbols:

1, re.sub and replace:

Sub of the whole is substitute, that is, the meaning of substitution, since it is easy to use the example of substitution, in fact, replace is also the meaning of substitution, but their usage is not the same, the following with an example to explain in detail their similarities and differences:

>>> import re>>> str1 = ' Hello 111 is 222 ' >>> str2 = str1.replace (' 111 ', ' 222 ') >>> Prin T (str2) Hello 222 is 222>>>

This is a simple example, if this is the case, to change all the numbers to 222, then it is more cumbersome to implement with replace, and the sub method of the RE module is relatively simple to achieve: (if it is more complex operations, using replace may not be implemented.) )

>>> import re>>> str1 = ' Hello 123 is 456 ' >>> str2 = re.sub (' \d+ ', ' 222 ', str1) >>> print (STR2) Hello 222 is 222>>>

2, Re.search () and Re.match ():

Match: Matches only from the beginning of the string to the regular expression, the match returns Matchobject successfully, otherwise none is returned.

Search: Matches all string attempts to the regular expression, returns none if all strings are not matched, or returns Matchobject.

The following example illustrates the similarities and differences between match and search, as well as the fact that search uses more than one application:

Import restr = ' helloword,i am Alex ' if not re.match (' word ', str):p rint (' cannot match ') print (Re.match (' Hello ', str1). Group ()) Print (Re.search (' word ', str1). Group ()) #显示结果cannot Matchhelloword


3, Re.split:

In Python, if you want to split a string, you only need to call Str's Split method can be implemented, but this split can only be based on a character to split the operation, if you want to specify more than one character to split, it will not be implemented.

Fortunately, the RE module also provides the split method to split the string, and this method is more powerful, can be divided according to multiple characters at the same time, the following is a look at Str split and re split what is the difference between the place:

str1 = ' Helloword,i;am\nalex ' str2 = Str1.split (', ') print (str2) Import RESTR3 = Re.split (' [, |;| \ n] ', str1) print (STR3) #下面是不同的输出结果 [' Helloword ', ' I;am\nalex '] [' helloword ', ' I ', ' am ', ' Alex ']


From which we can see that the above-mentioned authenticity.

4, FindAll:

The FindAll method is basically the same as the compile method, and their usage is:

By compile the string form of a regular expression into a pattern instance, and then using the Patte instance to invoke the FindAll method to generate the match object to get the result, let's take a look at the special character meaning of the preset in the regular expression before the instance that binds them:

\d matches any decimal number; it is equivalent to class [0-9].

\d matches any non-numeric character; it is equivalent to class [^0-9].

\s matches any whitespace character; it is equivalent to the class ["T" n "R" F "v].

\s matches any non-whitespace character; it is equivalent to the class [^ "T" n "R" F "v].

\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].

\w matches any non-alphanumeric character; it is equivalent to class [^a-za-z0-9_].

After reading the meaning of these special characters, let us give an example to illustrate the above argument:

Import restr1 = ' asdf12dvdve4gb4 ' pattern1 = re.compile (' \d ') pattern2 = Re.compile (' [0-9] ') Mch1 = Pattern1.findall (str1) MCH2 = Pattern2.findall (str1) print (' mch1:\t%s '% mch1) print (' mch2:\t%s '% mch2) #输出结果mch1: [' 1 ', ' 2 ', ' 4 ', ' 4 ']13 mch2: [' 1 ', ' 2 ', ' 4 ', ' 4 ']


The above two examples can be very good to explain the above argument, but also shows that the special character \d is exactly the same as [0-9], through the output can be seen, then if you do not want to split each number into a single element in the list, but to the output of the 12 overall, So you can do this: (just add a + number to the back of the \d, and the + number here means the whole output of one or more connected decimal digits)

Import restr1 = ' asdf12dvdve4gb4 ' pattern1 = re.compile (' \d+ ') pattern2 = Re.compile (' [0-9] ') Mch1 = Pattern1.findall (str1 ) Mch2 = Pattern2.findall (str1) print (' mch1:\t%s '% mch1) print (' mch2:\t%s '% mch2) #输出结果mch1: [' 4 ', ' 4 ']MCH2: [' 1 ', ' 2 ', ' 4 ', ' 4 ']


Let's take a small example, which combines the special character and the sub function of the RE to remove all the blanks in the string:

Import restr1 = ' asd \tf12d vdve4gb4 ' new_str = re.sub (' \s* ', ' ", str) print (NEW_STR) #输出结果asdf12dvdve4gb4


5, meta-character:

What we usually call the two-dollar word have; 2-dollar character:. ^ $ * + ? { } [ ] | ( ) \

The first metacharacters we examine are "[" and "]". They are often used to specify a character category, the so-called character category is a character set that you want to match. Characters can be listed individually, or they can be separated by a "-" number of two given
Character to represent a range of characters. For example, [ABC] will match any one of the characters in "a", "B", or "C", or you can use the interval [a-c] to represent the same character set, and the former effect is the same. If you only want to match lowercase letters, then RE should be written as a [a-z]. metacharacters do not work in categories. For example, [akm$] will match the character "a", "K", "M", or "$" in any one; "$" is usually used as a meta-character, but in the character category, its properties are removed and restored to normal words
Character.

[]: metacharacters [] represent character classes, where only characters ^ 、-、] and \ have special meanings in a character class. The character \ Still means escape, character-can define a range of characters, the character ^ is placed in front, indicating non. (This is also mentioned in the above special character descriptor example),

+ match + number 1 times to unlimited
? 0 to 1 times before matching the number.
{m} matches the preceding content m times
{M,n} matches the preceding contents m to n times

Here is a small example to illustrate the use of the above character Fu Yan characters []: (In this example below, there are two points to note: one is behind \d+?). The second is to add a character r in front of the match, in fact, in this example, plus and no can show the same result)

>>> Import re>>> Print (Re.findall (r "A (\d+?)", "a123b")) [' 1 ']>>> print (Re.findall (r "A (\d+ "," a123b ")) [' 123 ']>>>


The above is a small series to introduce you to the Python regular expression learning summary, I hope we have some help, if you have any questions please give me a message, small series will promptly reply to you. Thank you very much for your support for topic.alibabacloud.com!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.