Three things ordinary people do not understand: The doctor's prescription, Taoist Ghost, programmer get regular expression What is a regular expression?
- Regular expressions, also known as regular expressions, are called regular expression in English, often abbreviated as regex, RegExp, or re in code.
- is a concept of computer science.
- Many programming languages support the use of regular expressions for string manipulation.
- For example, in Python, a powerful regular expression interface is built in.
- The concept of regular expressions was initially popularized by tools software (such as SED and grep) in Unix.
- Regular expressions are usually abbreviated as "regex", singular with regexp, regex, plural regexps, regexes, Regexen.
- A regular is a finite notation that deals with infinite sequences of expressions.
What's the effect?
- Regular tables are often used to retrieve and replace text that conforms to a pattern (rule).
- Regular expressions are pairs of strings, including ordinary characters (for example, letters A through Z) and special characters (called metacharacters)).
- A logical formula for an operation is to use a predefined set of characters, and a combination of those specific characters,
- Form a "rule string" that is used to express a filtering logic for a string.
- A regular expression is a text pattern that describes one or more strings to match when searching for text.
Regular expression syntax: Regular expression: motive: 1. Text processing has become a common work of computer 12. The logic of searching, locating, and extracting text is often more complex than 3. In order to solve the above problem, the regular expression technique definition is generated: Regular expression is the advanced matching pattern of text, which provides the functions of searching, replacing and acquiring. The essence is a string of characters consisting of a series of special symbols, which are regular expressions. Features * Easy to search and modify text operations * Support for a wide range of languages * Flexible regular expression matching means: by setting a special meaning symbol, describing the repetition behavior of symbols and characters and position characteristics to represent a class of specific rules of the string Python--->ReModule processing Regular Expression Re.findall (pattern,string) parameter: pattern: Passing in a string as a regular expression. String: The target string to match returns the value: Gets the list that can put the regular match content into the list regular expression metacharacters in the target strings: 1.Normal charactersMatch: (except for follow-up special characters embox are ordinary characters) can be used to match the characters in [one]: Re.findall ("abc", ' Abcdefghabi ') out [All]: [' abc '] in [[]: Re.findall ("Chengdu", ' Chengdu Street Walk ') out[12]: [' Chengdu '] 2. Or:Meta characters: |Matching rules: matches the regular expression on both sides of the symbol can in []: Re.findall ("AB|CD", ' Abcdefghabi ') out[14]: [' AB ', ' CD ', ' AB '] 3. Match a single character:Meta-character:.Match rule: match any character except \ n F.O---> foo FAO [email protected] in [+]: Re.findall ("w.o", ' Woo,wao ' not WBO ') out[18]: [' woo ', ' Wao ', ' WBO '] 4. Match start Position:Meta-characters: ^Match rule: Matches the beginning of the target string in []: Re.findall ("^jame", "Jame,how is You") out[20]: [' Jame '] 5. Match End Position:Meta characters: $Match rule: match the end position of the target string in []: Re.findall ("py$", "hello.py") out[23]: [' py '] 6. Match duplicates:meta-characters: *Match rule: Matches the preceding occurrence of the regular expression 0 or more times fo*----> F fo fooooooooooooooooooo in [+]: Re.findall ("ab*", "ABCDEFAE&A MP;65ABBBBBBBB ") out[31]: [' ab ', ' a ', ' abbbbbbbb '] 7. Match duplicates:Meta characters: +Match rule: Matches the previous occurrence of the regular expression 1 or more times ab+---"AB abbbbb in [[]: Re.findall (". +py$ "," hello.py ") out[33]: [' hello.py '] 8. Match repeat:meta-character:? Match rule: Matches the previous occurrence of the regular expression 0 or 1 times ab? ---> A AB in [approx]: Re.findall ("AB", "abcea,adsfabbbbbb") out[36]: [' ab ', ' a ', ' a ', ' AB '] 9 . Match duplicates:Meta characters: {n}Match rule: matches the preceding regular occurrence n times ab{3}---> abbb in []: Re.findall ("Ab{3}", "ABCEA,ADSFABBBBBB") out[39] : [' abbb '] 10. Match duplicates:Meta-character: {m,n}Matching rules: match the preceding regular m-n times ab{3,5}---> abbb abbbb abbbbb in [three]: Re.findall ("ab{3,5}", "Ab abbb abbbbabbbb BB ") out[45]: [' abbb ', ' abbbb ', ' abbbbb '] 11. Match Character Set:Meta-character: [Character set]Match rule: match any character in the character set [abc123]--a B C 1 2 3 in []: Re.findall ("[Aeiou]", "Hello World") OUT[46]: [' e ', ' o ', ' o '] [0-9] [A-z] [a-z] [0-9a-z] in [all]: Re.findall ("^[a-z][a-z]*", "Hel Lo World ") out[47]: [' Hello '] [_abc0-9] 12. Match Character Set:meta-characters: [^ ...]Match rule: matches any character except the character set character in brackets in [[]: Re.findall ("[^0-9]+", "Hello1") out[50]: [' hello '] 13. Match any (non) numeric character:meta characters: \d \dMatch rule: \d matches any numeric character [0-9] \d matches any non-numeric character [^0-9] in []: Re.findall ("1\d{10}", "13717776 561 ") out[52]: [' 13717776561 '] in [+]: Re.findall (" \d+ "," Hello World ") out[53]: [' Hello World '] 14. matches any (non) ordinary character: (Numeric letter underlined ordinary utf-8 character)meta characters: \w \wMatch rule: \w matches an ordinary character \w matches a non-normal character in [the]: Re.findall ("\w+", "hello$1") out[54]: [' H Ello ', ' 1 '] in []: Re.findall ("\w+", "hello$1") out[55]: [' $ '] 15. Match (non) NULL character: (space, \ r \ n \ \v \f)meta characters: \s \sMatch rule: \s matches any null character \s matches any non-null character in [the]: Re.findall ("\w+\s+\w+", "Hello World") O UT[59]: [' Hello World '] in [a]: Re.findall ("\s+", "Hello World") out[61]: [' Hello ', ' world '] 16. Match starting and ending position:meta characters: \a \zMatch rule: \a Match string start position ^ \z Match string End position $ in []: Re.findall ("\ahi", "Hi,tom") O UT[63]: [' Hi '] in [2]: Re.findall ("is\z", "the") out[2]: [' is '] exact match: Use a regular expression to match all of the target string tolerance in [6]: Re.findall ("\a\w{5}\d{3}\z", ' abcde123 ') out[6]: [' abcde123 '] 17. Match (non) word boundary: (the intersection position of normal and non-normal characters is considered a word boundary)meta characters: \b \b match rule: \b match word boundary position &N Bsp \B match non-word boundary location in [page]: Re.findall (r "\bis\b", ' This is a test ') &N Bsp OUT[17]: [' is '] meta-character summary: match single character : &NBSP ; a . [...] [^ ...] \d \d \w \w \s \s Match repeat: &NBSP;&NBSP;&N bsp;* + ? {n} {m,n} Match location: ^ $ \A&NB Sp \Z \b \b other : | () \ regular expression escape: Regular expression special characters: . * & nbsp;? $ ^ [] {} () \ in regular expressions if you want to match these special characters you need to escape in []: Re.findall ("\[\d+\" ", ' abc[123] ') out[23]: [' [123] '] raw string---' Original string feature: Do not escape the contents of the string, that is, to express the original meaning to invalidate the escape character &NB Sp R "\b" ---> \b &NB Sp "\\b" ---> \b in [ [Re.findall]: ("\\[email protected]\\w+\\.cn", ' [email protected] ') OUT[39]: [' [email protected] '] in [Max]: Re.findall (r "\[email& nbsp;protected]\w+\.cn ", ' [email protected] ') OUT[40]: [' [email Protected] '] greedy and non-greedy: &nbsP Greedy mode: Regular expressions repeat matches, always matching content backwards as much as possible. * + ? {m,n} Greedy---"non-greedy (lazy) as few matches as possible *? +? ?? {m,n}? in []: Re.findall (r "ab*", ' abbbbbbbbbb ') & nbsp OUT[46]: [' A '] in []: Re.findall (r "ab+", ' abbbbbbbbbb ') out [+]: [' ab '] regular expression grouping: using () you can set up subgroups for regular expressions, and subgroups do not affect the overall matching of regular expressions, and can be considered an internal unit. subgroup role: 1. Forms the inner whole, the behavior of some metacharacters of the table in [Re.search]: R (AB) + ", ' Abababab '). Group () out[52]: ' Abababab ' Re.search (r "\[email p Rotected]\w+\. (COM|CN) ", ' [email protected] '). Group () &NBsp 2. sub-group matching content can be obtained separately Re.search (r "\[email protected]\w+\. COM|CN) ", ' [email protected] '). Group (1) out[59]: ' com ' Subgroup considerations: * There can be multiple subgroups in a regular expression, distinguishing between first and second ... Subgroups sub-groups do not overlap, as simple as possible capturing and non-capturing groups (command group, unnamed group) & nbsp; capture Format :(? P<name>pattern)Re.search (r "(? P<DOG>AB) + ", ' Abababab '). Group () Effect: 1 easy to differentiate by name each subgroup 2 capturing group can call the calling format repeatedly:(? P=name)P<DOG>AB) CD (? P=dog) ===> Abcdab in [the]: Re.search (? P<DOG>AB) cdef (? P=dog) ", ' Abcdefab '). Group () out[69]: ' abcdefab ' regular expression matching principle: 1. Correctness matches the target string 2 correctly. Uniqueness in addition to matching the target content, as far as possible there is no unwanted content 3. Comprehensiveness to the target string possible situation to consider the overall non-leakageuse of the RE module: regex = re.compile (pattern,flags = 0) features: &NBSP ; Generate regular Expression objects parameters: pattern &NBSP Regular Expressions flags feature flags, rich regular expression matching return value: & nbsp return a regular expression object Re.findall (pattern,string,flags = 0) &NBSP ; Features: match target string content with regular expressions parameters: &NB Sp pattern Regular expressions string Target string return value: list is a match to the content &NBSP ; If a regular expression has subgroups, only the contents of the subgroup are returned Regex.findall (string,pos,endpos) Function: match target string content according to regular expression parameters: string target string Pos,endpos: Intercept target string Start and end position match, the default is the entire string return value: &NB Sp list matches to content If regular expressions have subgroups, only the content in the subgroup is returned RE.SPL It (pattern,string,flags = 0) functions: using regular expressions to cut the target string parameters: pattern regular string target string return value: &NB Sp return cut content as a list re.sub (pattern,replace,string,max,flags) Function: Replace regular expression match content parameters: &NBS P pattern Regular replace What to replace &NBS P string target string max &NBS P Set a maximum number of replacements return values: replacement string RE.SUBN (pattern,replace,string,max,flags) features and parameters with sub Return value one more actually replaced several Re.finditer (pattern,string,flags) features: & nbsp Use regular match target string parameters: patter n Regular string target string Return value: &NBSP;&NBSp Iteration Object----Iteration content for match object Re.fullmatch (pattern,string,flags) & nbsp function: match exactly one string parameter: &nbs P pattern regular string &NB Sp Target string return value: Match object, match to content &NB Sp Re.match (pattern,string,flags) function: match a string start content parameters: pattern regular & nbsp string target string return value: &NB Sp Match objects, matching content Re.search (pattern,string,flags) Features:   match the first qualifying string parameters: &NB Sp pattern regular string &N Bsp target string return value: MA Tch object match to content Regex object Properties flags mark-up value pattern & nbsp Regular Expressions groups sub-group number &NBSP ; groupindex Get capture group dictionary, key for group name value is group
Python full stack regular expression (concept, syntax, metacharacters, re-modules)