[Learn Python step by step] 12. Introduction to Python Regular Expressions

Source: Internet
Author: User

A positive expression is a pattern that matches text fragments. in PythonreThis module supports regular expressions.

1. Basic concepts of Regular Expressions

Period (.Can match any character other than the line break, calledWildcard.

 

Escape characters with special meanings when used as common characters. For examplepython.orgThe expression must be written as follows:python\\.orgOnly.

Why are two backslashes used?

This is to escape through the interpreter, and requires two levels of escape: 1. escape through the interpreter; 2. escape through the re module. If you do not want to use two backslashes, consider using the original string, such:r'python\.org'.

 

Character Set is in brackets ([]. Character Set can match any character in it. That is'[pj]ython'YespythonAndjython.

 Scope of use

Applicable scope, such'[a-z]'It can match any character from a to z.'a-zA-Z0-9'It can match any uppercase or lowercase letter or number.

 

Reverse Character Set

We can also reverse the character set, such'[^abc]'Matches any character except a, B, and c.

 

Special characters in Character Set

Special characters are used as text characters in the mode, rather than regular expression operators. However, it is not required in character sets. in only three cases, special characters must be escaped when used as common text:

  • ^The delimiter starts with a character set.
  • ]The right brackets start with the character set.
  • -A horizontal line (character range) starts with a character set.

 

Pipeline symbol (|Is a special character used to select an item. For example:'python|ruby'Match the words python and ruby.

Subparttern means to enclose the selection items with parentheses. For example'p(ython|erl)'Matches python and perl.

 

After a question mark is added to the submode, it becomes an option, for example:

r

 

The above pattern can only match the following string:

 

 

The question mark indicates that the submode can appear once or not at all. The following operator allows the submode to repeat multiple times:

  • (Pattern) *: The pattern can be repeated 0 times or multiple times.
  • (Pattern) +: allowed mode appears once or multiple times
  • (Pattern) {m-n}: allowed mode repetition m ~ N times

 

Use^Start with a string marked with delimiters; use the dollar sign$End of the string. For example:

 

 

2. re Module

Split (pattern, string [, maxsplit = sub (pat, repl, string [, count = escape (string) escape all special regular expression characters in the string

 

The following is a simple example of these functions:

     pattern = r string =           text =  pattern =    re.split(pattern,text)    pattern =   text =     pattern = r      pattern =  text =   re.sub(pattern, , text)     re.escape()   re.escape() 

 

2.1 matching objects and Groups

WhenreWhen the function that matches strings in the module finds the matching item,MatchObjectObject.

 

Group concept

This object contains information about the substring of the matching mode, which is composed of groups. In short, a group is the Child mode placed in parentheses. The serial number of the group depends on the number of parentheses on the left. Group 0 indicates the entire mode. In the following mode:

 

 

Include these groups:

0   There was a wee cooper who lived 1234   lived  Fyfe

 

The following are common methods for re matching objects:

Span ([group]) returns the start and end positions of a group.

 

Example:

   m = re.match(r,  m.group(1)   m.start(1)   m.end(1)   m.span(1) 

 

Except for the overall match (group 0), only 99 groups can be used, that is, the group range is between 1 and 99.

 

2.2 Use the replacement function of re

Usere.subFunctions can be combined with group numbers to provide more complex functions for strings, as shown below:

= r re.sub(emphasis_pattern,r,)

 

Greedy mode and non-Greedy Mode

Repeated operators are greedy by default (greedy).As many matches as possible. The following mode uses the greedy mode:

   emphasis_pattern = r text =    re.sub(emphasis_pattern,r,text)

 

The non-Greedy mode is opposite to the greedy mode.Match as few as possible. To convert a repeated operator to a non-Greedy mode, you only need to add a question mark (?:

   emphasis_pattern = r text =    re.sub(emphasis_pattern,r,text)

 

References & further reading

Python Doc -- re Module

Basic Python tutorial (version 2)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.