Python re regular

Source: Internet
Author: User
Tags uppercase letter alphanumeric characters

Regular expression Regex


Motive: Manipulation of strings (text) is one of the main tasks of the computer
To find a type of string from text or large strings
In order to solve the above problem----"Regular expression

Definition: The essence is a string of characters and special symbols that are used to denote a certain class of strings of certain rules.


Advantages and use:

1. is an independent technology
2. Python---"Re" can be used in multiple programming languages


Re.findall (regex,string)
Function: Match string by regular expression
Parameter: Regex regular expression
String target strings
Return value: Returns a list of all matching items in the list


Regular meta characters

* Single character
Match rule: match the corresponding character
e.g. AB AB

In [4]: Re.findall (' ab ', ' ABCDEFABG ')
OUT[4]: [' AB ', ' AB ']


* Match a single character

Metacharacters:.
Match rule: match any character except ' \ n '

e.g. F.O---> foo FAO FBO

In [6]: Re.findall (' f.o ', ' FAOSFAAFBO ')
OUT[6]: [' FAO ', ' FBO ']

* matches the starting position of the string

Metacharacters: ^

Match rule: ^ position must be the starting position of the string, usually descriptor with other meta-characters

e.g. ^ABC abcdef

In [8]: Re.findall (' ^abc ', ' abcdefgh ')
OUT[8]: [' abc ']

* Match string end

Metacharacters: $
Match rule: match the end position of the string

e.g. py$ a.py

In [All]: Re.findall (' py$ ', ' a.py ')
OUT[11]: [' py ']

* Match Duplicates

Metacharacters: *
Match rule: Matches the preceding occurrence of the regular expression 0 or more times

e.g. ab* a ab abbbbb abbbbbb

in [+]: Re.findall (' ab* ', ' aabbabbljlk ')
OUT[17]: [' A ', ' ABB ', ' ABB ']

* Match Duplicates

Metacharacters: +
Match rule: Matches the preceding occurrence of the regular expression 1 or more times

n [+]: Re.findall (' ab+ ', ' aabbabbljlk ')
OUT[18]: [' ABB ', ' ABB ']


* Match Duplicates

Metacharacters:?
Match rule: Matches the previous occurrence of the regular expression 0 or 1 times

In []: Re.findall (' ab ', ' aabbabbljlk ')
OUT[20]: [' A ', ' ab ', ' AB ']

* Match Duplicates

Metacharacters: {n}
Match rule: Match specified number of repetitions

In []: Re.findall (' ab{2} ', ' Aabbabbljlk ')
OUT[22]: [' ABB ', ' ABB ']

* Match Duplicates

Metacharacters: {m,n}
Matching rules: match repeats M to n times

in [+]: Re.findall (' ab{2,5} ', ' Aabbabbbbbbljlk ')
OUT[26]: [' ABB ', ' abbbbb ']

* Character Set Matching

Metacharacters: [ABCD]
Match rule: matches any character in a character set

* Match the character range

Metacharacters: [0-9] [A-z] [a-z]
Matching rules: Any character in the match interval can be written together, and other character sets can be added.

e.g. [_3-9a-z]

in [+]: Re.findall (' [_0-9a-za-z]+ ', ' Hello World ' Hello py_2 ')
OUT[30]: [' Hello ', ' world ', ' hello ', ' py_2 ']


* Set counter

meta-characters [^ ....]
Match rule: match any one of the characters that are no longer in the collection

e.g. [^0-9] A G & $

in [+]: Re.findall (' [^0-9]+ ', ' Hello World ' Hello py_2 ')
OUT[31]: [' Hello World Hello Py_ ']

Metacharacters: \d \d
Match rule: Any numeric character any one non-numeric character
in [+]: Re.findall (' \d{8} ', ' 12345678 ')
OUT[32]: [' 12345678 ']


Metacharacters: \w \w
Match rule: any one of the alphanumeric characters underlined any special character
[_0-9a-za-z] [^_0-9a-za-z]


Metacharacters: \s \s
Match rule: any null character matches any non-null character
[\n\r\t\0]

in [+]: Re.findall (' \s\s+ ', ' Hello World nihao China ')
OUT[41]: [' World ', ' Nihao ', ' China ']


Metacharacters: \a \z
Match rule: Matches the beginning of the string ^ matches the end of the string at the location $

In [Re.findall]: (' \afoo\z ', ' foo ')
OUT[52]: [' foo ']


metacharacters: \b \b
Match rule: match word boundary position match non-word boundary position

Word boundary: The position of a numeric letter underline and other character junctions is considered a word boundary

In [the]: Re.findall (R ' foo\b ', ' foo Food foot ')
OUT[59]: [' foo ']

In [MAX]: Re.findall (R ' foo\b ', ' foo Food foot ')
OUT[60]: [' foo ', ' foo ']


Metacharacters: |

Matching rules: Connecting multiple regular expression formation or relationships
in [+]: Re.findall (' Abc|bcd ', ' abcdefbcdef ')
OUT[64]: [' abc ', ' BCD ']

Escape character
* $ ? + \d \s

1. There are many special characters Fu Weihuan characters in regular expressions, which are escaped if they need to match to a special character when setting a match.

e.g \----> \ \---> \* \d----> \\d

2. When using a programming language, regular expressions are often passed in as strings
and the character string of a programming language is escaped.


Python str-----"raw STR considers the string to be original and not escaped

\\* \*

Matches a single character: normal character. \d \d \w \w \s \s [...] [^...]

Match location: ^ $ \a \z \b \b

Match repetitions: * +? {n} {m,n}

Other: |

Greed and non-greed

Greedy mode: When the number of repetitions is not sure, the regular expression always matches as many backwards as possible

+ {M,n}


Non-greedy mode: Add after repeating meta-characters?

In [94]: Re.findall (' ab*? ', ' Abbbbbbba ')
OUT[94]: [' A ', ' a ']

In [up]: Re.findall (' ab+? ', ' Abbbbbbba ')
OUT[95]: [' AB ']


Regular expression subgroups

In a regular expression, you can use () to take part of a regular expression as a subgroup of the regular expression

Regex (AB) *cdef

What a subgroup can do

* Subgroups represent an internal whole that can change the scope of repeated meta-characters
* Many programming language functions can extract the contents of subgroups separately
* More convenient to use and call

In principle, there can be many subgroups in a regular expression. From the outside to the inside, from left to there are respectively called the first subgroup, the second sub-group ... Sub-groups do not cross


Capturing groups

Child group naming (? P<NAME>ABCD)
Child Group calls (? P=name)

(? P<DOG>AB) cdef (? P=dog)


Non-capturing group


Your time:

Matches a password of 8-10 bits in length, must start with a letter, and a digital letter underline
In [2]: Re.findall (R ' ^[a-za-z]\w{7,9}$ ', ' abc123_a ')
OUT[2]: [' abc123_a ']

Match ID number
In [6]: Re.search (R ' \d{17} (\d|x) ', ' 123123123123123123 '). Group ()
OUT[6]: ' 123123123123123123 '


Matches a word that begins with an uppercase letter in a text

In []: Re.findall (R ' \b[a-z]\w*\b ', data)
OUT[14]: [' Python ', ' Hello ', ' World ']


Python----> Regex

module RE

Compile (pattern, flags=0)
Function: Generate a regular Expression object
Parameter: pattern: Regular Expression
Flags: Extended flag bit, default = 0 means no expansion
Return value: Regular Expression object

obj = compile (' abc ')

The functions in the following * * * can be called by re directly and compile objects

**************************************************
Re.findall (Pattern, string, flags=0)
Function: Match target string according to regular expression
Parameter: pattern Regular expression
String target strings
Flags: Regular extension flag bit
Return value: Everything matched to is returned as a list
If there is a grouping, only the content that the subgroup can match is returned.

Obj.findall (String=none, pos=0,endpos=99999)
Function: Match target string according to regular expression
Parameters: String Target strings
POS: Match the starting position of the target string
Endpos: Match end position of target string
Return value: Everything matched to is returned as a list
If there is a grouping, only the content that the subgroup can match is returned.


Finditer ()
Function: With FindAll
Parameter: Same findall
Return value: Returns an iterative object that gets each value for the match obj

*match object: Finditer match Fullmatch Search
These functions give the result of a regular match to a match object, which makes it easy to do something specific.

Fullmatch ()
Function: Match a string exactly with a regular expression
Parameters: Target String
Return value: Returns matching to the match object if no match is returned to none

Match ()
Function: Matches the beginning of a string
Parameters: Target String
return value I: Return match object if matched to content returns none

Search ()
Function: Matches the first matched string
Parameters: Target String
Return value: Return match object if matched to content returns none

Split ()
Function: Cut the string by regular expression
Parameters: Target String
Return value: Put the cut string into the list

Sub (Re_str,string,max)
Function: Replace the part of the regular expression with the specified string
Parameter: re_str string to replace
String: Target string
Max: replace up to a few places
Return value: The replaced string

SUBN ()
Function: Replace the part of the regular expression with the specified string
Parameter: re_str string to replace
String: Target string
Max: replace up to a few places
Return value: The return value is a two tuple, the first item is the replaced string, and the second is the actual replacement of several places
****************************************************

Compile returns the properties of an object

Flags: Regular expressions represent the shaping of bits
Pattern: Regular Expression
Groupindex: Returns the name of the capturing group as the key, and the first group is the Dictionary of Merit
Groups: How many subgroups are in a regular expression


Match Search Fullmatch Finditer

Match object properties and methods

Property:
POS: Matches the starting position of the target string
Endpos: Match end position of target string
Lastgroup: Gets the name of the last child group, or none if no name is given
Lastindex: Perhaps the last subgroup is the first child group
Re:match regular expressions used to match
Regs: The whole of the regular expression and the part that each subgroup matches
String:match Matching target string


Method:

Start ()
Gets the start position of the match in the string

End ()
Gets the end position of the match in the string (next to the end character subscript)

Span ()
Get the match to the start and end position of the content in the string

Group (N)
Function: Gets the content to which the match object matches
Parameter: N defaults to 0 to indicate the entire regular match to the content
When a positive integer is attached to n, it means to get the nth sub-group match
Return value: Returns the match to the string

Groups ()
Function: Gets the content that all subgroups match to


Groupdict ()
Function: The name of the capturing group and the matching content form a key-value pair relationship


Re.compile re.findall re.match re.search .....


' A ',
' ASCII ',

' S ' let. Can match line breaks
' Dotall ',

' I ', ignoring uppercase and lowercase
' IGNORECASE '

' L ',
' LOCALE ',

' M ', made for ^ $ so that it can match the beginning of each line end
' MULTILINE '

' T ',
' TEMPLATE ',

' U ',
' UNICODE '

' X ' allows you to add a comment that starts with #
' VERBOSE ',


When multiple flags are used simultaneously, the middle is split with a vertical line
Re. I | Re. S

Python re regular

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.