Python Regular Expression Learning summary _ regular expressions

Source: Internet
Author: User

The way to implement regular in Python is through the RE (regular expression abbreviation) module, you can invoke the various methods of the RE module to implement different functions, so let's say, in Python, we can call those methods through the RE module, And what are the effects of these methods, and what are the regular instances and the meanings of the various special symbols:

 1, re.sub and replace:

The whole of a sub is substitute, that is, the meaning of substitution, since we know it is replaced, it is easy to use the example, in fact, replace is also the meaning of substitution, but their usage is not the same, below an example to explain their similarities and differences in detail:

>>> import re
>>> str1 = ' Hello ' is 222 '
>>> str2 = Str1.replace (' A ', ' 222 ')
& gt;>> print (str2)
Hello 222 is 222

This is a simple example, if the following situation, all the numbers are replaced by 222, then the use of replace to achieve a more cumbersome, and the re module of the sub method to achieve a simple comparison: (if it is more complex operations, use of replace may not be implemented.) )

>>> import re
>>> str1 = ' Hello 123 is 456 '
>>> str2 = re.sub (' \d+ ', ' 222 ', str1)
& gt;>> print (str2)
Hello 222 is 222

 2, Re.search () and Re.match ():

Match: Matches only from the beginning of the string with the regular expression, the match returns Matchobject successfully, otherwise none.

Search: All string attempts to match a regular expression, if all strings do not match, return none, or return matchobject.

The following example illustrates the similarities and differences between match and search, and also shows that in practical applications, search uses a lot of:

import re
str = ' helloword,i am Alex '
if not re.match (' word ', str):
print (' cannot match ')
print ( Re.match (' Hello ', str1). Group ())
print (Re.search (' word ', str1). Group ())
#显示结果
cannot match
hello

3, Re.split:

In Python, if you want to split a string, you only need to call the Str split method, but this split can only be based on a character to split the operation, if you want to specify multiple characters to split, it can not be achieved.

Fortunately, the RE module also provides the split method to split the string, and this method is more powerful, can be based on multiple characters at the same time to split the operation, the following is a look at the STR split and re split is what different places:

str1 = ' Helloword,i;am\nalex '
str2 = Str1.split (', ')
print (str2)
import re
str3 = Re.split (' [, |;| \ n] ', str1
print (STR3)
#下面是不同的输出结果
[' Helloword ', ' I;am\nalex ']

From which we can see that the above is said to be authentic.

  4, FindAll:

The FindAll method is basically the same as the compile method, and their usage is:

By compile the string form of a regular expression into a pattern instance, and then using the Patte instance to invoke the FindAll method to generate the match object to obtain the result, we look at the special character meaning of the preset in the regular expression before we combine the instances:

\d matches any decimal number, which is equivalent to class [0-9].

\d matches any non-numeric character; it corresponds to the class [^0-9].

\s matches any whitespace character; it corresponds to the class ["T" n "R" F "v]."

\s matches any non-white-space character; it corresponds to the class [^ "T" n "R" F "v].

\w matches any alphanumeric character; it is equivalent to a class [a-za-z0-9_].

\w matches any non-alphanumeric character; it corresponds to the class [^a-za-z0-9_].

After reading the meaning of these special characters, let's give an example to illustrate the above argument:

import re
str1 = ' asdf12dvdve4gb4 '
pattern1 = re.compile (' \d ')
pattern2 = Re.compile (' [0-9] ') mch1
= Pattern1.findall (str1)
mch2 = Pattern2.findall (str1) print (' mch1:\t%s '
% mch1)
print (' mch2:\t%s '% mch2)
#输出结果

The above two examples can be very good to illustrate the above argument, it also shows that the special character \d is exactly the same as [0-9], so if you don't want to divide each number into one element and put it in the list, you want to output the 12 whole. So you can do this: (It's done by adding a + number to the back of the \d, where the + number represents the total output of one or more contiguous decimal digits).

import re
str1 = ' asdf12dvdve4gb4 '
pattern1 = re.compile (' \d+ ')
pattern2 = Re.compile (' [0-9] ')
Mch1 = Pattern1.findall (str1)
mch2 = Pattern2.findall (str1)
print (' mch1:\t%s '% mch1)
print (' mch2:\t% S '% mch2)
#输出结果
mch1: [' 12 ', ' 4 ', ' 4 ']

Let's take a little example, this example is the combination of special characters and the sub function of re to remove all the blanks in the string:

import re
str1 = ' asd \tf12d vdve4gb4 '
new_str = re.sub (' \s* ', ', str ')
print (NEW_STR)
#输出结果

 5, meta characters:

We usually say two-yuan character have 2-yuan characters:. ^ $ * + ? { } [ ] | ( ) \

The first metacharacters we examine are "[" and "]". They are often used to specify a character category, which is called a character set that you want to match. The characters can be listed individually, or they can be separated by a "-" number of two given
Character represents a range of characters. For example, [ABC] will match any of the characters in "a", "B", or "C", or you can use interval [a-c] to represent the same character set, and the former effect is consistent. If you only want to match lowercase letters, then the RE should be written as [A-z]. metacharacters do not work in categories. For example, [akm$] will match the character "a", "K", "M", or any of the "$", and "$" is usually used as a metacharacters, but in a character category, its attributes are removed and restored to ordinary words
Character.

[]: The metacharacters [] represent character classes in which only the character ^ 、-、] and \ have special meaning. Character \ Still represents escape, character-you can define a range of characters, the character ^ is placed in front, and the non. (This is also present in the special character descriptor example above),

+ matching + number before content 1 times to infinite times
? Match number 0 to 1 times before
{m} matches the preceding content m times
{M,n} matches the preceding content m to n times

Here is a small example to illustrate the use of the character Fu Yan [] above: (in the following example, there are two points to note: one is behind the \d+?). The meaning of the number, the second is a match with a character R, in fact, in this example, plus and no can show the same result.

>>> import re
>>> print (Re.findall (r "A (\d+?)", "a123b"))
[' 1 ']
>>> print ( Re.findall (R "A (\d+)", "a123b"))
[' 123 ']
>>>

The above is a small series to introduce the Python regular expression learning summary, I hope to help everyone, if you have any questions please give me a message, small set will promptly reply to everyone. Here also thank you very much for the cloud Habitat Community website support!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.