"Python Learning" re module--Regular expression

Source: Internet
Author: User



1. Regular expression Syntax:


. Represents any character

[] to match a specified character category, the so-called character category is a character set that you want to match, a relationship that can be understood as a character in a character set.

^ If placed at the beginning of the string, it means to take the non-meaning. [^5] represents characters other than 5. And if ^ is not at the beginning of the string, it represents itself.


Metacharacters with repetitive functions:

* Repeat 0 to infinity for the first character

Repeat 1 to infinity for the first character

? Repeat 0 to 1 times for the previous character

{M,n} for the previous character repeats in M to n times, where {0,} = *,{1,} =, {0,1} =?

{m} repeats m times for previous character


\d matches any decimal number; it is equivalent to class [0-9].

\d matches any non-numeric character; it is equivalent to class [^0-9].

\s matches any whitespace character; it is equivalent to class [FV].

\s matches any non-whitespace character; it is equivalent to the class [^ FV].

\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].

\w matches any non-alphanumeric character; it is equivalent to class [^a-za-z0-9_].


The difference between 2.martch and search


Python provides two different primitive operations: match and search. Match is matched from the beginning of the string, and search (perl default) makes any match from the string.


Note: When the regular expression starts with ' ^ ', match is the same as search. Match succeeds only if and only if the matched string starts with a match or matches from the position of the POS parameter. As follows:


>>> Import re

>>> Re.match ("C", "abcdef")

>>> Re.search ("C", "abcdef")

<_sre. Sre_match Object at 0x00a9a988>


>>> Re.match ("C", "Cabcdef")

<_sre. Sre_match Object at 0x00a9ab80>


>>> Re.search ("C", "Cabcdef")

<_sre. Sre_match Object at 0x00af1720>


>>> patterm = Re.compile ("C")

>>> Patterm.match ("abcdef")

>>> Patterm.match ("abcdef", 1)

>>> Patterm.match ("ABCdef", 2)

<_sre. Sre_match Object at 0x00a9ab80>


3. Module contents


Re.compile (pattern, flags=0)


Compiles a regular expression, returns a Regexobject object, and can then invoke the match () and the search () method through the Regexobject object.


Prog = Re.compile (pattern)

result = Prog.match (string)


With

result = Re.match (pattern, string)

is equivalent.


The first way to achieve the reuse of regular expressions.


Re.search (Pattern, string, flags=0)

Looks in the string to see if it matches the regular expression. Returns _SRE. The Sre_match object, if it cannot match, returns none.


Re.match (Pattern, string, flags=0)


Whether the beginning of the string can match the regular expression. Returns _SRE. The Sre_match object, if it cannot match, returns none.


Re.split (Pattern, string, maxsplit=0)


Separates a string from a regular expression. If you enclose the regular expression in parentheses, the matching string is also returned in the list. Maxsplit is the number of separations, the maxsplit=1 is separated once, the default is 0, the number of times is not limited.


>>> re.split (' \w+ ', ' Words, Words, Words. ')

[' Words ', ' Words ', ' Words ', ']

>>> re.split (' (\w+) ', ' Words, Words, Words. ')

[' Words ', ', ', ' Words ', ', ', ' Words ', '. ', ']

>>> re.split (' \w+ ', ' Words, Words, Words. ', 1)

[' Words ', ' Words, Words. ']

>>> re.split (' [a-f]+ ', ' 0a3b9 ', flags=re. IGNORECASE)


Note: The python I used was 2.6, and the source code found that split () did not have the flags parameter, and 2.7 only increased. This problem I found more than once, the official documents and source inconsistencies, if found abnormal, should go to the source code to find the reason.


If it matches at the beginning or end of the string, the returned list will start or end with a blank string.


>>> re.split (' (\w+) ', ' ... words, words ... ')

[', ' ... ', ' words ', ', ', ' words ', ' ... ', ']


If the string does not match, a list of the entire string is returned.


>>> Re.split ("A", "BBB")

[' BBB ']


Re.findall (Pattern, string, flags=0)


Find all the substrings that the RE matches and return them as a list. This match is returned from left to right in an orderly manner. If there is no match, an empty list is returned.

>>> Re.findall ("A", "Bcdef") []>>> Re.findall (r "\d+", "12a32bc43jf3") [' + ', ' + ', ' 3 ', ' + '] f=open (" /tmp/a.log "," r ") ipaddress = [] lines = F.readlines () for line in Lines:ipaddress.extend (Re.findall (R ' ([1-2]?\d?\ D\. [1-2]?\d?\d\. [1-2]?\d?\d\. [1-2]?\d?\d] ', line)) print ipaddressf.close ()


Re.finditer (Pattern, string, flags=0)


Find all the substrings that the RE matches and return them as an iterator. This match is returned from left to right in an orderly manner. If there is no match, an empty list is returned.


>>> it = Re.finditer (r "\d+", "12A32BC43JF3")

>>> for match in it:

Print Match.group ()

12

32

43

3

Re.sub (Pattern, Repl, String, count=0, flags=0)


Find all the substrings that the RE matches and replace them with a different string. The optional parameter count is the maximum number of times a pattern match is replaced, and count must be a non-negative integer. The default value is 0 to replace all matches. If there is no match, the string will return unchanged.


RE.SUBN (Pattern, Repl, String, count=0, flags=0)


The same as the Re.sub method, but returns a two-tuple that contains the new string and the number of substitution executions.


Re.escape (String)

To escape non-alphanumeric numbers in a string


Re.purge ()


Emptying regular expressions in the cache


4. Regular Expression objects


Re. Regexobject


Re.compile () returns the Regexobject object


Re. Matchobject


Group () returns a string that is matched by RE


Start () returns the position where the match started


End () returns the position of the end of the match


Span () returns a tuple containing the position of the match (start, end)



5. Compile Flag


The compile flag allows you to modify some of the way regular expressions are run. In the RE module The logo can use two names, one is full name such as IGNORECASE, one is abbreviated, one letter form like I. (If you are familiar with Perl's pattern modifications, use the same letters in one letter; for example, re.) The abbreviated form of verbose is re. X. Multiple flags can be specified by bitwise or-ing them. such as Re. I | Re. M is set to the I and M flags:


I

IGNORECASE


Makes the match insensitive to case, and the character class and the string that match the letter are ignored when the case is written. For example, [A-z] can also match lowercase letters, Spam can match "Spam", "Spam", or "Spam". This lowercase letter does not take into account the current position.


L

LOCALE


Affects "W," W, "B, and" B, depending on the current localization setting.


Locales is a feature in the C language library and is used to help with programming that requires different languages to consider. For example, if you are working with French text, you want to use "w+ to match the text, but" W matches only the character class [a-za-z]; it does not match "é" or "?". If your system is properly configured and localized to French, the internal C function tells the program that "é" should also be considered a letter. The use of the LOCALE flag when compiling regular expressions will give you the ability to use these C functions to process "W" compiled objects, which will be slower, but will also be able to match the French text with "w+" as you would expect.


M

MULTILINE


(At this time ^ and $ will not be interpreted; they will be introduced in section 4.1.)


Use "^" to match only the beginning of the string, and $ to match only the end of the string and the end of the string directly before the line break (if any). When this flag is specified, "^" matches the start of the string and the beginning of each line in the string. Similarly, the $ metacharacters match the end of the string and the end of each line in the string (directly before each line break).


S

Dotall


Make the "." Special character match any character exactly, including line breaks; no this flag, "." matches any characters except line breaks.


X

VERBOSE


This flag is given by giving you a more flexible format so that you can write regular expressions much easier to understand. When the flag is specified, a white space character in the re string is ignored, unless the whitespace is in the character class or after the backslash, which allows you to organize and indent the re more clearly. It can also allow you to write comments to the RE, which are ignored by the engine; the comment is identified by the "#" sign, but the symbol cannot be followed by a string or backslash.



Finally: If you can use a string method, do not select the regular expression, because the string method is simpler and faster.


"Python Learning" re module--Regular expression

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.