Python full stack regular expression (concept, syntax, metacharacters, re module)

Source: Internet
Author: User

Three things ordinary people do not understand: The doctor's prescription, Taoist Ghost, programmer get regular expression What is a regular expression?

    • Regular expressions, also known as regular expressions, are called regular expression in English, often abbreviated as regex, RegExp, or re in code.
    • is a concept of computer science.
    • Many programming languages support the use of regular expressions for string manipulation.
    • For example, in Python, a powerful regular expression interface is built in.
    • The concept of regular expressions was initially popularized by tools software (such as SED and grep) in Unix.
    • Regular expressions are usually abbreviated as "regex", singular with regexp, regex, plural regexps, regexes, Regexen.
    • A regular is a finite notation that deals with infinite sequences of expressions.
What's the effect?
    • Regular tables are often used to retrieve and replace text that conforms to a pattern (rule).
    • Regular expressions are pairs of strings, including ordinary characters (for example, letters A through Z) and special characters (called metacharacters)).
    • A logical formula for an operation is to use a predefined set of characters, and a combination of those specific characters,
    • Form a "rule string" that is used to express a filtering logic for a string.
    • A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expression syntax: Regular expression: motive: 1. Text processing has become a common work of computer 12. The logic of searching, locating, and extracting text is often more complex than 3.        In order to solve the above problem, the regular expression technique definition is generated: Regular expression is the advanced matching pattern of text, which provides the functions of searching, replacing and acquiring.     The essence is a string of characters consisting of a series of special symbols, which are regular expressions.    Features * Easy to search and modify text operations * Support for a wide range of languages * Flexible regular expression matching means: by setting a special meaning symbol, describing the repetition behavior of symbols and characters and position characteristics to represent a class of specific rules of the string Python--->ReModule processing Regular Expression Re.findall (pattern,string) parameter: pattern: Passing in a string as a regular expression. String: The target string to match returns the value: Gets the list that can put the regular match content into the list regular expression metacharacters in the target strings: 1.Normal charactersMatch: (except for follow-up special characters embox are ordinary characters) can be used to match the characters in [one]: Re.findall ("abc", ' Abcdefghabi ') out  [All]: [' abc '] in [[]: Re.findall ("Chengdu", ' Chengdu Street Walk ') out[12]: [' Chengdu '] 2. Or:Meta characters: |Matching rules: matches the regular expression on both sides of the symbol can in []: Re.findall ("AB|CD", ' Abcdefghabi ') out[14]: [' AB ', ' CD ', ' AB '] 3. Match a single character:Meta-character:.Match rule: match any character except \ n F.O---> foo FAO [email protected] in [+]: Re.findall ("w.o", ' Woo,wao ' not WBO ') out[18]: [' woo ', ' Wao ', ' WBO '] 4. Match start Position:Meta-characters: ^Match rule: Matches the beginning of the target string in []: Re.findall ("^jame", "Jame,how is You") out[20]: [' Jame '] 5. Match End Position:Meta characters: $Match rule: match the end position of the target string in []: Re.findall ("py$", "hello.py") out[23]: [' py '] 6. Match duplicates:meta-characters: *Match rule: Matches the preceding occurrence of the regular expression 0 or more times fo*----> F fo fooooooooooooooooooo in [+]: Re.findall ("ab*", "ABCDEFAE&A MP;65ABBBBBBBB ") out[31]: [' ab ', ' a ', ' abbbbbbbb '] 7. Match duplicates:Meta characters: +Match rule: Matches the previous occurrence of the regular expression 1 or more times ab+---"AB abbbbb in [[]: Re.findall (". +py$ "," hello.py ") out[33]:     [' hello.py '] 8. Match repeat:meta-character:? Match rule: Matches the previous occurrence of the regular expression 0 or 1 times ab? ---> A AB in [approx]: Re.findall ("AB", "abcea,adsfabbbbbb") out[36]: [' ab ', ' a ', ' a ', ' AB '] 9 . Match duplicates:Meta characters: {n}Match rule: matches the preceding regular occurrence n times ab{3}---> abbb in []: Re.findall ("Ab{3}", "ABCEA,ADSFABBBBBB") out[39] : [' abbb '] 10. Match duplicates:Meta-character: {m,n}Matching rules: match the preceding regular m-n times ab{3,5}---> abbb abbbb abbbbb in [three]: Re.findall ("ab{3,5}", "Ab abbb abbbbabbbb BB ") out[45]: [' abbb ', ' abbbb ', ' abbbbb '] 11. Match Character Set:Meta-character: [Character set]Match rule: match any character in the character set [abc123]--a B C 1 2 3 in []: Re.findall ("[Aeiou]", "Hello World") OUT[46]: [' e ', ' o ', ' o '] [0-9] [A-z] [a-z] [0-9a-z] in [all]: Re.findall ("^[a-z][a-z]*", "Hel  Lo World ") out[47]: [' Hello '] [_abc0-9] 12. Match Character Set:meta-characters: [^ ...]Match rule: matches any character except the character set character in brackets in [[]: Re.findall ("[^0-9]+", "Hello1") out[50]: [' hello '] 13. Match any (non) numeric character:meta characters: \d \dMatch rule: \d matches any numeric character [0-9] \d matches any non-numeric character [^0-9] in []: Re.findall ("1\d{10}", "13717776 561 ") out[52]: [' 13717776561 '] in [+]: Re.findall (" \d+ "," Hello World ") out[53]: [' Hello World '] 14. matches any (non) ordinary character: (Numeric letter underlined ordinary utf-8 character)meta characters: \w \wMatch rule: \w matches an ordinary character \w matches a non-normal character in [the]: Re.findall ("\w+", "hello$1") out[54]: [' H  Ello ', ' 1 '] in []: Re.findall ("\w+", "hello$1") out[55]: [' $ '] 15. Match (non) NULL character: (space, \ r \ n \ \v \f)meta characters: \s \sMatch rule: \s matches any null character \s matches any non-null character in [the]: Re.findall ("\w+\s+\w+", "Hello World") O     UT[59]: [' Hello World '] in [a]: Re.findall ("\s+", "Hello World") out[61]: [' Hello ', ' world '] 16. Match starting and ending position:meta characters: \a \zMatch rule: \a Match string start position ^ \z Match string End position $ in []: Re.findall ("\ahi", "Hi,tom") O UT[63]: [' Hi '] in [2]: Re.findall ("is\z", "the") out[2]: [' is '] exact match: Use a regular expression to match all of the target string  tolerance in [6]: Re.findall ("\a\w{5}\d{3}\z", ' abcde123 ') out[6]: [' abcde123 '] 17. Match (non) word boundary: (the intersection position of normal and non-normal characters is considered a word boundary)meta characters: \b \b         match rule:  \b  match word boundary position                 &N Bsp    \B  match non-word boundary location             in [page]: Re.findall (r "\bis\b", ' This is a test ') &N Bsp           OUT[17]: [' is ']  meta-character summary:    match single character  :        &NBSP ;   a  .  [...]   [^ ...]   \d  \d   \w  \w \s  \s    Match repeat:          &NBSP;&NBSP;&N bsp;*   +   ?   {n}  {m,n}     Match location:             ^  $  \A&NB Sp  \Z   \b  \b      other  :             |    ()   \   regular expression escape:    Regular expression special characters:        .  *  & nbsp;?   $  ^  []  {}    ()   \        in regular expressions if you want to match these special characters you need to escape             in []: Re.findall ("\[\d+\" ", ' abc[123] ')             out[23]: [' [123] ']     raw  string---' Original string         feature: Do not escape the contents of the string, that is, to express the original meaning             to invalidate the escape character   &NB Sp             R "\b"    ---> \b              &NB Sp "\\b"    ---> \b                         in [ [Re.findall]: ("\\[email protected]\\w+\\.cn", ' [email protected] ')             OUT[39]: [' [email protected] ']             in [Max]: Re.findall (r "\[email& nbsp;protected]\w+\.cn ", ' [email protected] ')             OUT[40]: [' [email  Protected] ']   greedy and non-greedy:  &nbsP Greedy mode:         Regular expressions repeat matches, always matching content backwards as much as possible.             *  +   ?   {m,n}     Greedy---"non-greedy (lazy)   as few matches as possible         *?    +?   ??    {m,n}?         in []: Re.findall (r "ab*", ' abbbbbbbbbb ')       & nbsp OUT[46]: [' A ']         in []: Re.findall (r "ab+", ' abbbbbbbbbb ')         out [+]: [' ab ']   regular expression grouping:    using () you can set up subgroups for regular expressions, and subgroups do not affect the overall matching of regular     expressions, and can be considered an internal unit.       subgroup role:        1. Forms the inner whole, the behavior of some metacharacters of the table             in [Re.search]: R (AB) + ", ' Abababab '). Group ()             out[52]: ' Abababab '            Re.search (r "\[email p Rotected]\w+\. (COM|CN) ", ' [email protected] '). Group ()      &NBsp   2.  sub-group matching content can be obtained separately             Re.search (r "\[email protected]\w+\. COM|CN) ", ' [email protected] '). Group (1)             out[59]: ' com '      Subgroup considerations:        * There can be multiple subgroups in a regular expression, distinguishing between first and second ... Subgroups         sub-groups do not overlap, as simple as possible       capturing and non-capturing groups   (command group, unnamed group)       & nbsp;  capture Format  :(? P<name>pattern)Re.search (r "(? P&LT;DOG&GT;AB) + ", ' Abababab '). Group () Effect: 1 easy to differentiate by name each subgroup 2 capturing group can call the calling format repeatedly:(? P=name)P&LT;DOG&GT;AB) CD (? P=dog) ===> Abcdab in [the]: Re.search (? P&LT;DOG&GT;AB) cdef (? P=dog) ", ' Abcdefab '). Group () out[69]: ' abcdefab ' regular expression matching principle: 1. Correctness matches the target string 2 correctly. Uniqueness in addition to matching the target content, as far as possible there is no unwanted content 3. Comprehensiveness to the target string possible situation to consider the overall non-leakageuse of the RE module:    regex = re.compile (pattern,flags = 0)         features:         &NBSP ;   Generate regular Expression objects         parameters:             pattern    &NBSP Regular Expressions             flags  feature flags, rich regular expression matching         return value:   & nbsp         return a regular expression object      Re.findall (pattern,string,flags = 0)       &NBSP ; Features:             match target string content with regular expressions         parameters:     &NB Sp       pattern      Regular expressions             string        Target string         return value:             list is a match to the content     &NBSP ;       If a regular expression has subgroups, only the contents of the subgroup are returned      Regex.findall (string,pos,endpos)         Function:              match target string content according to regular expression         parameters:              string        target string             Pos,endpos: Intercept target string Start and end position               match, the default is the entire string         return value:     &NB Sp       list matches to content             If regular expressions have subgroups, only the content in the subgroup is returned      RE.SPL It (pattern,string,flags = 0)         functions:             using regular expressions to cut the target string         parameters:             pattern      regular             string        target string         return value:     &NB Sp       return cut content as a list      re.sub (pattern,replace,string,max,flags)         Function:            Replace regular expression match content         parameters:           &NBS P pattern      Regular             replace      What to replace     &NBS P       string        target string             max    &NBS P     Set a maximum number of replacements         return values:             replacement string      RE.SUBN (pattern,replace,string,max,flags)         features and parameters with sub        Return value one more actually replaced several      Re.finditer (pattern,string,flags)         features:     & nbsp       Use regular match target string         parameters:              patter n      Regular             string        target string         Return value: &NBSP;&NBSp           Iteration Object----Iteration content for match object      Re.fullmatch (pattern,string,flags)   & nbsp     function:             match exactly one string         parameter:   &nbs P           pattern      regular             string  &NB Sp     Target string         return value:            Match object, match to content    &NB Sp Re.match (pattern,string,flags)         function:             match a string start content         parameters:              pattern      regular   & nbsp         string        target string         return value:      &NB Sp     Match objects, matching content      Re.search (pattern,string,flags)         Features:  &nbsp           match the first qualifying string         parameters:          &NB Sp   pattern              regular             string  &N Bsp             target string         return value:            MA Tch object match to content       Regex object Properties         flags                  mark-up value         pattern          & nbsp   Regular Expressions         groups             sub-group number     &NBSP ;   groupindex      Get capture group dictionary, key for group name value is group

Python full stack regular expression (concept, syntax, metacharacters, re-modules)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.