Python basics-regular expressions, python Regular Expressions

Source: Internet
Author: User
Tags printable characters

Python basics-regular expressions, python Regular Expressions
Regular Expression

 

Regular Expressions provide the basis for advanced text pattern matching, extraction, and/or text search and replacement. In short, regular expressions are strings consisting of characters and special characters. They describe the pattern repetition or multiple characters, therefore, regular expressions can match a series of strings with similar features according to a certain pattern. In other words, they can match multiple strings. A regular expression pattern that can only match one string is boring and useless.

There are two main parts:

1. Regular Expression

2. re module in python

 

All operations on regular expressions use the re module in the python standard library.

 

 

Our first Regular Expression
1 import re2 3 pattern = re. compile (r 'python') # pre-compile 4 m = re. search (pattern, 'Hello python') 5 6 print (m. group ())

>>> Python

1 import re2 3 m = re. search (r 'python', 'Hello python') 4 5 print (m. group ())
# No pre-compilation, and the results are the same
>>> Python
1. pattern = re. compile (r 'python ')

The regular expression can be pre-compiled or directly matched without compilation. However, during code execution, the interpreter will compile the regular expression mode into a regular expression object, in addition, regular expressions are compared multiple times during execution. Therefore, we strongly recommend that you use pre-compile.

2. r 'python'

A simple regular expression that matches Python in a string.

In python, adding r before the string indicates the native string

Like most programming languages, regular expressions use "\" as escape characters, which may cause backlash troubles. If you need to match the character "\" in the text, four Backslash "\" will be required in the regular expression expressed in programming language "\\\\": the first two and the last two are used to convert them into backslashes in the programming language, convert them into two backslashes, and then escape them into a backslash in the regular expression. The native string in Python solves this problem well. The regular expression in this example can be represented by r. Similarly, "\ d" matching a number can be written as r "\ d ". With the native string, you no longer have to worry about missing the backslash, and the written expression is more intuitive.
Native string

There are many expressions for regular expressions, which will be explained later. For more details, refer to relevant documents.

3. re. search (r 'python', 'Hello python ')

Call the search () method of the re module in python. The first parameter is a regular expression, the second parameter is the string to be matched, andMatching object.

There are still many methods in the re module, which will be introduced below.

4. m. group ()

M isMatching object,Therefore, if print (m) is used directly, an object will be output and the matching result we want cannot be obtained.

There are two main methods for matching objects: group () and group (). We use these two methods to get the expected results. The specific usage will be explained in detail later.

Go to the formal Regular Expression learning page.

Next we will explain in detail the knowledge points in the first regular expression above,If you are confused in the next study, try to replace the relevant content with the above Code for understanding.

1. Regular Expression 1. Common Character Description
. Match any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
X | y Match x or y. For example, 'z | food' can match "z" or "food ". '(Z | f) ood' matches "zood" or "food ".
[Xyz] Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '.
[^ Xyz] Negative value character set combination. Match any character not included. For example, '[^ abc]' can match 'P', 'l', 'I', and 'n' in "plain '.
[A-z] Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z.
[^ A-z] Negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z.
\ D Match a numeric character. It is equivalent to [0-9].
\ D Match a non-numeric character. It is equivalent to [^ 0-9].
\ W Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W

Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2. Special Character Description
$ Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches '\ n' or' \ R '. To match the $ character, use \ $.
() Mark the start and end positions of a subexpression. Subexpressions can be obtained for future use. To match these characters, use \ (and \).
* Matches the previous subexpression zero or multiple times. To match * characters, use \*.
+ Match the previous subexpression once or multiple times. To match + characters, use +.
. Match any single character except linefeed \ n. To match., use \.
[ Mark the start of a bracket expression. To match [, use \[.
? Match the previous subexpression zero or once, or specify a non-Greedy qualifier. To match? Character, use \?.
\ Mark the next character as a special character, or a literal character, or backward reference, or an octal escape character. For example, 'n' matches the character 'n '. '\ N' matches the line break. The sequence '\' matches "\", while '\ (' matches "(".
^ Matches the start position of the input string. Unless used in the square brackets expression, this character set is not accepted. To match the ^ character itself, use \ ^.
{ Mark the start of a qualifier expression. To match {, use \{.
| Specifies a choice between two items. To match |, use \ |.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3. Non-printable characters
\ F Match a form feed. It is equivalent to \ x0c and \ cL.
\ N Match A linefeed. It is equivalent to x0a and cJ.
\ R Match a carriage return. It is equivalent to x0d and cM.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent [? \ F \ n \ r \ t \ v].
\ S Match any non-blank characters. It is equivalent to [^? \ F \ n \ r \ t \ v].
\ T Match a tab. It is equivalent to \ x09 and \ cI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ cK.

 

 

 

 

 

 

 

 

 

 

 

4. Qualifier
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )?" It can match "do" in "do" or "do" in "does ".? It is equivalent to {0, 1 }.
{N} N is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N ,} N is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
{N, m} Both m and n are non-negative integers, where n <= m. Match at least n times and at most m times. Liu, "o {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.

 

 

 

 

 

 

 

 

 

 

After reading so much, let's take a look at some examples:

M = re. search (R' (p | t) H', 'the') print (m. group () # >>> thm = re. search (R' [abcde] ',' the ') # match any character in the character set print (m. group () # >>> em = re. search (R '.. ',' ')#. match any character (excluding \ n) print (m. group () # >>> thm = re. search (R' ^ ind (e | a) ', 'inda inde') # print (m. group () # >>> indam = re. search (r 'ind (e | a) $ ', 'inda inde') # print (m. group () # >>> indem = re. search (r 'inde * ', 'initeee') # match the regular expression print (m. group () #> ind Eeeeem = re. search (r 'inde {3} ', 'enabledeee') # match the regular expression print (m. group () # >>> indeeem = re. search (r'inde *? ', 'Enabledeee') # non-Greedy match print (m. group () # >>> indm = re. search (r'inde +? ', 'Enabledeee') # non-Greedy match print (m. group () # >>> indem = re. search (r 'I (nd) e *', 'initeeeee') # () is used for grouping, and print (m. group (0), m. group (1) # >>> indeeeee nd
Example 2: Regular Expression and python 1. match () method

Matches the pattern from the starting part of the string. If the match succeeds, a matching object is returned. If the match fails, None is returned.

2. search () method

Searches the string for the matching condition that occurs for the first time in the mode. If the match is successful, the matching object is returned. If the match fails, the system returns None.

3. findall () and finditer () are used to locate each occurrence.

All locations where the search mode appears in the string. Unlike search () and match (), findall () always returns a list. If a match fails, an empty list is returned. The match succeeds, the list contains all successfully matched parts.

The difference between finditer () and findall () is whether an iterator or a list is returned. Saves more memory.

4. search and replace sub () and subn ()

The two are almost the same. They both replace all the matching regular expressions in a string in some form.

The difference is that the returned value, sub () returns the replaced string, subn () returns the replaced string and the number indicating the total number of replicas, together as a tuples with two elements

5. split () split string

Re. split () and str. split () work in the same way. You can set a value for the max parameter to specify the maximum number of splits.

M = re. match (r 'foo', 'Food on the table') if m is not None: print (m. group () # >>> foom = re. search (r 'foo', 'seafood ') if m is not None: print (m. group () # >>> foom = re. findall (r 'th \ w + ', 'this and that. ') print (m) # >>> [' that '] m = re. findall (r 'th \ w + ', 'this and that. ', re. i) # case-insensitive print (m) # >>> ['this', 'that '] m = re. finditer (r 'th \ w + ', 'this and that. ', re. i) for I in m: print (I. group () # >>> This #>>> thatm = re. sub (R' [AE] ', 'x', 'abcdef') print (m) # >>> XbcdXfm = re. subn (R' [AE] ', 'x', 'abcdef') print (m) # >>> ('xbcdxf', 2) m = re. split (':', 'str1: str2: str3') print (m) # >>> ['str1', 'str2', 'str3']
Example 6: group () and group () Methods

The matched object returned after successful matching has two methods: group () and groups ().

Group () either returns the entire matching object or the special sub-group as required. Groups () returns a unique or all sub-groups. If no sub-group is required, group () returns the entire match, and groups () returns an empty tuples.

m = re.match(r'(\w\w\w)-(\d\d\d)','abc-123')print(m.group())#>>>abc-123print(m.group(1))#>>>abcprint(m.group(2))#>>>123print(m.groups())#>>>('abc', '123')
Example 7. Common Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.