python-Regular Expressions

Source: Internet
Author: User
Tags alphabetic character

Regular expressions, also known as regular expressions (Regular expression)

1. Regular in any language occupies this very important piece, what is a regular expression?

Regular expressions are a way to find or filter the appropriate strings according to certain rules (personal summary)

2. Regular expressions also have a corresponding syntax, and the regular expression module is re

Re.match (regular expressions, strings that need to be matched) the rules of the regular expression are, of course, written by the developer.

3.^ and $

There are a lot of uses in flask and Django frameworks, mainly to match routes, ^ is to start with XXX, $ is to end with XXX, once both are used, then it means only matching ^ and $ in the middle of the part

4. ^ Inside the grouping brackets the meaning is to take the opposite, in the outside represents the meaning is to start with XXX

5. (? p<name>) grouping, in parentheses as a group, p to absolute capitalization

6. Extract the area code and number:ret = re.match("([^-]*)-(\d+)","010-12345678")

7. When the front-end label matches, the front and back labels are consistent, and the following labels can be matched using the sample method: 

re.match(r"<(\w*)><(\w*)>.*</\2></\1>", label)

Advanced usage of 8.re modules:

Group:re.group () the ability to output regular expression-matching content

Search: Matches the number portion of the matched string (Re.match (R ' \d+ ', ' read 9999 ')

FindAll: Is the enhanced version of search that matches all the contents of a regular expression:

re.findall(r"\d+", "python = 9999, c = 7890, c++ = 12345")

Sub: replace the corresponding string with the matching rules of the regular expression:

re.sub(r"<[^>]*>|&nbsp;|\n", "", test_str)

Split: Treats matching strings according to the corresponding matching rules to be cut, and puts the cut values in a list to return:

re.split(r":| ","info:xiaoZhang 33 shandong")

9. Greed and non-greed

First of all, it is necessary to know that in Python, when matching, the number of words by default is greedy, always want to match more.

So, in some specific cases, we need to do a string match in a non-greedy pattern, then add it after some quantifiers. To play a limiting role. such as "*", "?", "+", "{m,n}", etc.

The role of 10.R

string preceded by R in Python for native string

11. Appendix

\

The next character marker, or a backward reference, or an octal escape character. For example, "\\n" matches \ n.

"\ n" matches the line break. The sequence "\ \" matches "\" and "\ (" Matches "(". Which is equivalent to a variety of programming languages.

The concept of "escape character".

^

Matches the beginning of the input word. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r".

$

Matches the end of the input line. If the multiline property of the RegExp object is set, $ also matches the position before "\ n" or "\ r".

*

Matches the preceding sub-expression any time. For example, zo* can match "Z" and also match "Zo" and "Zoo". * Equivalent to {0,}.

+

Matches the preceding subexpression one or more times (greater than or equal to 1 times). For example, "zo+" can Match "Zo" and "Zoo", but not "Z".

+ equivalent to {1,}.

?

Matches the preceding subexpression 0 or one time. For example, "Do (es)?" Can match "do" or "does".?

{n}

N is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".

{n,}

N is a non-negative integer. Match at least N times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood".

"O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".

{n,m}

Both m and n are non-negative integers, where n<=m. Matches at least N times and matches up to M times. For example, "o{1,3}" will match the "Fooooood"

The first three O is a group, and the last three O is a group. "o{0,1}" is equivalent to "O?". Note that there can be no spaces between a comma and two numbers.

?

When the character immediately follows any other restriction (*,+,?,{n},{n,},{n,m}), the matching pattern is non-greedy.

The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as much of the searched string as possible.

For example, for the string "Oooo", "o+" will match as many "O" as possible to get the result ["Oooo"], and "o+?" will be as little as possible

Match "O" to get results [' o ', ' o ', ' o ', ' o ']

. Point

Matches any single character except "\ n" and "\ R". To match any character, including "\ n" and "\ R", use a pattern like "[\s\s]".

(pattern)

Match pattern and get this match. The obtained match can be obtained from the resulting matches collection, in VBScript

Using the Submatches collection, the $0...$9 property is used in JScript. To match the parentheses character, use "\ (" or "\").

(?:p Attern)

A non-fetch match that matches pattern but does not get a matching result and is not stored for later use. This is used in the or character "(|)"

is useful to combine parts of a pattern. For example "Industr (?: y|ies)" is a more than "industry|industries"

A more abbreviated expression.

(? =pattern)

A non-fetch match, positive pre-check, matches the lookup string at the beginning of any string that matches the pattern, and the match does not need to be fetched for later use.

For example, "Windows (? =95|98| nt|2000) "Can match" Windows "in" Windows2000 ",

However, "Windows" in "Windows3.1" cannot be matched. Pre-check does not consume characters, that is, after a match occurs,

Starts the next matching search immediately after the last match, rather than starting with the character that contains the pre-check.

(?! Pattern

Non-fetch match, positive negation pre-check, matches the lookup string at the beginning of any mismatched pattern string,

This match does not need to be acquired for later use. For example, "Windows (?! 95|98| nt|2000) "Can match" Windows "in" Windows3.1 ",

However, "Windows" in "Windows2000" cannot be matched.

(? <=pattern)

Non-acquisition match, reverse positive pre-check, similar to positive pre-check, just the opposite direction. For example

"(? <=95|98| nt|2000) Windows can match "Windows" in "2000Windows",

However, "Windows" in "3.1Windows" cannot be matched.

"(? <=95|98| nt|2000) Windows "current re module test in python3.6 will error,

with "|" The string length of the connection must be the same, here "95|98| NT "Length is 2," 2000 "The length is 4, will be error.

(? <!pattern)

Non-acquisition matching, reverse negation pre-check, similar to positive negative pre-check, just opposite direction.

For example "(? <!95|98| nt|2000) Windows can match "Windows" in "3.1Windows",

However, "Windows" in "2000Windows" cannot be matched. This place is not right, it's a problem.

There can be no more than 2 bits, such as "(? <!95|98| NT|20) Windows is correct,

"(? <!95|980| NT|20) Windows error, if used alone, there is no limit, such as (? <!2000) Windows correctly match.

Ditto, here in python3.6 the string length in the RE module should be consistent, not necessarily 2,

For example, "(? <!1995|1998| ntnt|2000) Windows "is also possible.

X|y

Match x or Y. For example, "Z|food" can match "Z" or "food" (please be cautious here). "[Zf]ood" matches "Zood" or "food".

[XYZ]

The character set is combined. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain".

[^XYZ]

Negative character set. Matches any character that is not contained. For example, "[^ABC]" can match "Plin" in "plain".

[A-z]

The character range. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" to "Z".

Note: The range of characters can be represented only if the hyphen is inside a character group and appears between two characters;

If the beginning of the character group is out, only the hyphen itself can be represented.

[^a-z]

A negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" to "Z".

\b

Match a word boundary, that is, the position between the word and the space (that is, the "match" of the regular expression has two concepts,

One is the matching character, and the other is the matching position, where the \b is the matching position. For example, "er\b" can

Matches "er" in "never", but does not match "er" in "verb".

\b

Matches a non-word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never".

\cx

Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z.

Otherwise, c is considered to be a literal "C" character.

\d

Matches a numeric character. equivalent to [0-9]. grep to add-p,perl regular support

\d

Matches a non-numeric character. equivalent to [^0-9]. grep to add-p,perl regular support

\f

Matches a page break. Equivalent to \x0c and \CL.

\ n

Matches a line break. Equivalent to \x0a and \CJ.

\ r

Matches a carriage return character. Equivalent to \x0d and \cm.

\s

Matches any invisible character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].

\s

matches any visible character. equivalent to [^ \f\n\r\t\v].

\ t

Matches a tab character. Equivalent to \x09 and \ci.

\v

Matches a vertical tab. Equivalent to \x0b and \ck.

\w

Matches any word character that includes an underscore. Similar but not equivalent to "[a-za-z0-9_]", where the "word" character uses the Unicode character set.

\w

Matches any non-word character. Equivalent to "[^a-za-z0-9_]".

\xN

Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example

"\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions.

\Num

Matches num, where num is a positive integer. A reference to the obtained match. For example, "(.) \1 "matches two consecutive identical characters.

\N

Identifies an octal escape value or a backward reference. If there are at least N obtained sub-expressions before \N ,

Then n is a backward reference. Otherwise, if n is the octal number (0-7), N is an octal escape value.

\nm

Identifies an octal escape value or a backward reference. If at least nm has a sub-expression before \nm ,

The nm is a backward reference. If there are at least N fetches before the \nm , then n is a backward reference followed by the literal m .

If the preceding conditions are not satisfied, if both n and m are octal digits (0-7), then \nm will match the octal escape value nm.

\NML

If n is an octal number (0-7) and both m and l are octal digits (0-7), the octal escape value NMLis matched.

\uN

Match N, where n is a Unicode character represented by four hexadecimal digits.

For example, \u00a9 matches the copyright symbol (&copy;).

\P{P}

The lowercase p is the property's meaning, which represents the Unicode attribute, which is used for the prefix of the Unicode positive expression.

The "P" inside the brackets represents one of the seven character attributes of the Unicode character set: punctuation characters.

Six more properties:

L: Letters;

M: Marker symbol (usually not appearing alone);

Z: separators (such as spaces, line breaks, etc.);

S: Symbols (such as mathematical symbols, currency symbols, etc.);

N: Numbers (such as Arabic numerals, Roman numerals, etc.);

C: Other characters.

* Note: This syntax is not supported in some languages, for example: JavaScript.

\<

\>

The start (\<) and End (\>) of the matching word (word). For example, the regular expression \<the\> can

Matches the "the" in the string "for the Wise", but does not match the "the" in the string "otherwise".

Note: This meta-character is not supported by all software.

( )

The expression between (and) is defined as "group" and the character that matches the expression is saved to a

The staging area, which can hold up to 9 in a regular expression, can be referenced using \1 to \9 symbols.

|

Perform a logical or (or) operation on the two matching criteria. For example, regular expressions (Him|her)

Match "It belongs to him" and "it belongs to her",

But it does not match "it belongs to them." Note: This meta-character is not supported by all software.

python-Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.