Python Regular Expression guide, python Regular Expression

Source: Internet
Author: User

Python Regular Expression guide, python Regular Expression
1. Regular Expression basics 1.1. Brief Introduction

Regular expressions are not part of Python. Regular Expressions are powerful tools used to process strings. They have their own unique syntax and an independent processing engine, which may not be as efficient as the built-in str method, but are very powerful. Thanks to this, in languages that provide regular expressions, the syntax of regular expressions is the same. The difference is that different programming languages support different syntaxes, unsupported syntax is usually not commonly used. If you have already used regular expressions in other languages, you just need to take a look.

The following figure shows the matching process using a regular expression:

The general matching process of a regular expression is as follows: Compare the expression with the characters in the text in sequence. If each character can match, the matching succeeds. If any character fails to match, the matching fails. If the expression contains quantifiers or boundary, this process may be slightly different, but it is also easy to understand. You can see the examples and use them several times.

Lists the Python-supported regular expression metacharacters and syntaxes:

1.2. Greedy and non-Greedy modes of quantifiers

Regular Expressions are usually used to search for matched strings in the text. In Python, quantifiers are greedy by default (in a few languages, they may also be non-Greedy by default), and always try to match as many characters as possible; in non-greedy, the opposite is true, always try to match as few characters as possible. For example, if the regular expression "AB *" is used to find "abbbc", "abbb" is found ". If we use a non-Greedy quantizer "AB *? "," A "is found ".

1.3. slashes

Like most programming languages, regular expressions use "\" as escape characters, which may cause backlash troubles. If you need to match the character "\" in the text, four Backslash "\" will be required in the regular expression expressed in programming language "\\\\": the first two and the last two are used to convert them into backslashes in the programming language, convert them into two backslashes, and then escape them into a backslash in the regular expression. The native string in Python solves this problem well. The regular expression in this example can be represented by r. Similarly, "\ d" matching a number can be written as r "\ d ". With the native string, you no longer have to worry about missing the backslash, and the written expression is more intuitive.

1.4. Matching Mode

Regular Expressions provide some available matching modes, such as case-insensitive and multi-row matching. This part of content will be used in the factory method re of the Pattern class. compile (pattern [, flags.

2. re module 2.1. Start Using re

Python supports regular expressions through the re module. The general step to Use re is to first compile the string form of the regular expression into a Pattern instance, and then use the Pattern instance to process the text and obtain the matching result (a Match instance ), finally, use the Match instance to obtain information and perform other operations.

123456789101112131415 # encoding: UTF-8import re # Compile a regular expression into a Pattern objectpattern = re.compile(r'hello') # Use Pattern to match the text and obtain the matching result. If the matching fails, None is returned.match = pattern.match('hello world!') if match:    # Use Match to obtain group information    print match.group() ### Output #### hello

Re. compile (strPattern [, flag]):

This method is a factory method of the Pattern class. It is used to compile a regular expression in the string form into a Pattern object. The second parameter flag is the matching mode. The value can take effect simultaneously using the bitwise OR operator '|', such as re. I | re. M. In addition, you can specify the mode in the regex string, such as re. compile ('pattern', re. I | re. M) and re. compile ('(? Im) pattern ') is equivalent.
Optional values:

  • Re.I(Re. IGNORECASE): Ignore case sensitivity (the brackets are complete, the same below)
  • M(MULTILINE): The MULTILINE mode changes the behavior of '^' and '$' (SEE)
  • S(DOTALL): Any point matching mode, changing the behavior '.'
  • L(LOCALE): Make the pre-defined character class \ w \ W \ B \ B \ s \ S dependent on the current region settings
  • U(UNICODE): Make the predefined character class \ w \ W \ B \ B \ s \ S \ d \ D depend on the character attribute defined by unicode
  • X(VERBOSE): VERBOSE mode. In this mode, the regular expression can be multiple rows, ignore blank characters, and add comments. The following two regular expressions are equivalent:
1234 a = re.compile(r"""\d +  # the integral part                   \.    # the decimal point                   \d *  # some fractional digits""", re.X)b = re.compile(r"\d+\.\d*")

Re provides many module methods for completing the regular expression function. These methods can be replaced by the corresponding method of the Pattern instance. The only advantage is that less re. compile () code is written, but the compiled Pattern object cannot be reused at the same time. These methods will be introduced together in the instance method section of the Pattern class. The preceding example can be abbreviated:

12 m = re.match(r'hello', 'hello world!')print m.group()

The re module also provides the method escape (string) to use the regular expression metacharacters in the string, such as */+ /? If you add an escape character before returning it, it is useful when you need to match a large number of metacharacters.

2.2. Match

A Match object is a matching result that contains a lot of information about this matching. You can use the readable attributes or methods provided by Match to obtain this information.

Attribute:

Method:

1234567891011121314151617181920212223242526272829303132 import rem = re.match(r'(\w+) (\w+)(?P<sign>.*)', 'hello world!') print "m.string:", m.stringprint "m.re:", m.reprint "m.pos:", m.posprint "m.endpos:", m.endposprint "m.lastindex:", m.lastindexprint "m.lastgroup:", m.lastgroup print "m.group(1,2):", m.group(1, 2)print "m.groups():", m.groups()print "m.groupdict():", m.groupdict()print "m.start(2):", m.start(2)print "m.end(2):", m.end(2)print "m.span(2):", m.span(2)print r"m.expand(r'\2 \1\3'):", m.expand(r'\2 \1\3') ### output #### m.string: hello world!# m.re: <_sre.SRE_Pattern object at 0x016E1A38># m.pos: 0# m.endpos: 12# m.lastindex: 3# m.lastgroup: sign# m.group(1,2): ('hello', 'world')# m.groups(): ('hello', 'world', '!')# m.groupdict(): {'sign': '!'}# m.start(2): 6# m.end(2): 11# m.span(2): (6, 11)# m.expand(r'\2 \1\3'): world hello!
2.3. Pattern

The Pattern object is a compiled regular expression. You can use a series of methods provided by Pattern to search for the text.

Pattern cannot be directly instantiated and must be constructed using re. compile.

Pattern provides several readable attributes for obtaining information about an expression:

12345678910111213 import rep = re.compile(r'(\w+) (\w+)(?P<sign>.*)', re.DOTALL) print "p.pattern:", p.patternprint "p.flags:", p.flagsprint "p.groups:", p.groupsprint "p.groupindex:", p.groupindex ### output #### p.pattern: (\w+) (\w+)(?P<sign>.*)# p.flags: 16# p.groups: 3# p.groupindex: {'sign': 3}

Instance method [| re module method]:

The above is Python's support for regular expressions. Familiar with regular expressions is a skill that every programmer must possess. In this year, no program will deal with strings. I am also in the initial stage, sharing with you, ^ _ ^

In addition, the special constructor in the figure does not provide examples. It is difficult to use these regular expressions. If you are interested, think about how to match a word that does not start with abc, ^ _ ^

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.