Python module: RE module, Software Development directory specification

Source: Internet
Author: User

Re module: (Regular expression)

Regular expressions are the matching rules of strings

Regular expressions are supported in most programming languages, and the corresponding module in Python is re-

Common expression Rules: (All you need to remember)

“ . ” #默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行

"^" #匹配字符开头,若指定flags MULTILINE,这种也可以匹配上("^a","\nabc\neee",flags=re.MULTILINE)(即:如果flags指定了 re.MULTILINE, 每一行都会尝试去匹配)

"$" # matches the end of the character, if specified by flags MULTILINE, Re.search (' foo.$ ', ' foo1\nfoo2\n ', re. MULTILINE). Group () will match to Foo1. (If flags specifies the RE.) MULTILINE, each line will try to match)

"*" # matches the character before the * number 0 or more times, Re.search (' A * ', ' aaaabac ') result ' aaaa '; # ab* would match ' a ', ' ab ', or ' a ' followed by an Y number of ' B ' s.

"+" # matches "+" before a character 1 or more times, Re.findall ("ab+", "Ab+cd+abb+bba") results [' AB ', ' ABB '] # would ab+ match ' a ' followed B Y any non-zero number of ' B ' s; It would not match just ' a '.

“ ? ” # match? " "The previous character 1 or 0 times , Re.search (' B? ', ' Alex '). Group () matches B 0 Times # would ab? match either ' a ' or ' ab '.

"{m}" # matches the previous character m times, Re.search (' b{3} ', ' Alexbbbs '). Group () match to ' BBB '

"{n,m}" # matches the previous character N to M times, Re.findall ("ab{1,3}", "ABB ABC abbcbbb") Results [' ABB ', ' AB ', ' ABB ']

"[]" # used to indicate a set of characters. The characters in [] can be listed separately (for example: [abc123]), or "-" can be used to denote a range (eg: [0-9])

| # Match | left or | Right character, re.search ("abc| ABC "," ABCBABCCD "). Group () result ' ABC '

“ ( ... ) ” # group match; use. groups () to see the results of a separate match (in the form of a tuple) (which involves grouping.) Groups ())

Note: All of the above are often used

"\a" # matches only from the beginning of the character, Re.search ("\aabc", "ALEXABC") is not matched, equivalent to Re.match (' abc ', ' Alexabc ') or Re.search (' ^abc ', ' xxx ')

"\z" # matches the end of the character, same as $

"\d" # matches numbers 0 to 9, equivalent to [0-9] (often used) # Re.search (' \d+ ', string) # greedy match pattern

"\d" # matches non-numeric (often used)

"\w" # matches [a-za-z0-9] (that is, non-special characters) (often used)

"\w" # matches non-[a-za-z0-9] (that is, special characters) (often used)

"\s" # matches white space characters, \ n, \ t, \ r;

“ (? P<name>, ...) "# Group match; for example:

Importreid_s='130704200005251653'Res= Re.search ('(? P<province>\d{3}) (? P<city>\d{3}) (? P<BORN_YEAR>\D{4})', id_s)Print(Res.group ())Print(Res.groups ())#it involves grouping and using groups#output in the form of a dictionaryPrint(Res.groupdict ())#Printing results:#1307042000#(' 704 ', ' + ')#{' Province ': ' A ', ' City ': ' 704 ', ' born_year ': ' + '}

There are several matching syntaxes for re:

  • Re.match (pattern,string,flags=0) # matches from the beginning, detects whether the first element of the string matches the pattern you set, the elements that follow no longer detect matches, and returns the element that matches or "None" # official Explanation:

    If zero or More characters at the beginning of string match the regular expression patte RN, return a corresponding match object. Return None If the string does not match the pattern, note that this is different from a zero-length match.

    Note that even in MULTILINE modE, would only re.match() match at the beginning of the string and not< /c5> at the beginning.

    If you want to locate a match anywhere in string, search() use instead

  • Re.search (pattern,string,flags=0)     #  traverse the entire string to find the first element that matches your pattern, and the subsequent elements no longer detect a match and return the matched element or " None "   # official explanation:  scan through   string  looking For the first location where the regular ExpRESSION&NBSP; pattern  produces a match, and Span style= "COLOR: #ff0000" >return a corresponding match object . Return none  if no position in the string matches the pattern; Note that this is different from finding a zero-length match at some point in the string.
  • re.findall (pattern, String, flags=0)   #  put all matching characters (elements) into the list of elements returned    # official explanation:   return all non-overlapping matches of  pattern  in  string , as a list of strings . the  string  is scanned Left-to-right (to scan a string matching from left to right) , and Matches is returned in the order found . If One or more groups is present in the pattern, return a list of groups; This would be a list of tuples if the pattern has more tha n One group . Empty matches is included in the result.

Let's look at the effect of Re.match () and Re.search ():

ImportRes='1ab2c3'Print(Re.search ('[0-9]', s))Print(Re.match ('[0-9]', s))#Printing results:#<_sre. Sre_match object; span= (0, 1), match= ' 1 ' >#<_sre. Sre_match object; span= (0, 1), match= ' 1 ' > # Search and Match return an object that is not matched to the value. #to get a match to the value, you can take advantage of the. Group (), but you need to first determine if it exists, because if there is no match, the. Group () program will errorRes_match= Re.search ('[0-9]', s)ifRes_match:#make a judgment first .    Print(Res_match.group ())#Printing results:#1
    • Re,split (pattern, string, maxsplit=0,flags=0) # with matched characters as delimiters (regular expressions can be used to make fuzzy rules)
  Case 1:  import   res  =  "  Neo22alex18#mike-oldboy   " print  ( Re.split ( '  \d+|#|- , s)" #   Pattern: Follow \d+ or "#" or-go to split  #   output result:  #   [' Neo ', ' Alex ', ' ', ' Mike ', ' Oldboy '] # The 3rd empty element is because the "#" is split after 18split, and becomes an empty element 

# above using the pipe symbol "|" To define the pattern, use [] to define the
Import re
s = ' Neo22alex18#mike-oldboy '
Print (Re.split (' [\d+#-] ', s)) # [] means that the inside is included, and the effect is "|" Similar, but there is a difference (in the code to note [] and | The specific use of which), this example just want to emphasize "[] means that all include" this knowledge point
# Output Results:
# [' Neo ', ' ', ' Alex ', ' ', ' ', ' Mike ', ' Oldboy '] # \d+ not as a whole to split, but into the \d and character "+" to split, and I haven't figured out why ...


 import  re
s = 'neo22alex18|mike-oldboy' #
if asked to "|" As a delimiter
Print (Re.split ('\| ', s))
#| is also a grammar, if you do not want it as a grammar in the pattern, but as a character to use, it is preceded by a slash "\"

#Output Result:
#[' neo22alex18 ', ' Mike-oldboy ']

# If you want to use a slash "\" as a character instead of a grammar, add 3 "\" after this "\", i.e.a total of 4 "\"(Don't understand why, remember it first)
Import re
s = ' Neo22alex18\mike-oldboy '
Print (Re.split (' \\\\ ', s))
# Output Results:
# [' neo22alex18 ', ' Mike-oldboy ']

    • Re.sub (pattern, repl, string, count=0,flags=0) # matches characters and replaces
Import'neo22alex18\mike-oldboy'print(re.sub ('\d +','+', s)#  output Result:# Neo+alex+\mike-oldboy
    • Re.fullmatch (pattern, string,flags=0) # All matches: entire string match
Re.fullmatch ('\[email protected]\w+\. ( com|cn|edu)','[email protected]'   #  com|cn|edu  

# output: # <_sre. Sre_match object; span= (0, +), match= ' [email protected] ' >

    • Re.compile (Pattern, flags=0) # used to write a matching rule (pattern) # If you use this pattern many times, you can use compile to set the pattern first, and then call it directly; unlike Re.match (Pattern, string) This type of statement Python needs to compile the pattern each time first, compile's pattern Python only needs to be compiled one time after the direct call on the line. As follows:
Compile a regular expression pattern into a regular expression object, which can be used forMatching using itsmatch (), search () andOther  methods, described below. The sequenceprog = Re.compile (pattern) result = Prog.match (string) is equivalent toresult = Re.match (pattern, string) butusing Re.compile ()and saving the resulting regular expression object for Reuse is mOre efficient when the expression would beused several times In aSingle program.

Flag identifier:

    • Re. I (re. IGNORECASE): Ignore case (full notation in parentheses, same as below)
    • M (MULTILINE): Multiline mode, changing the behavior of ' ^ ' and ' $ '
    • S (dotall): Change '. ' Behavior: Make the '. ' special character match any character at all, including a newline (line feed); Without this flag, '. ' would match anything except a newline. (newline characters are also included.)
    • X (re. VERBOSE) can write a comment to your expression to make it more readable, the following 2 meanings

A = Re.compile (R"" "\d + # The integral part                \. # The decimal point                \d * # some fractional digits"" ",                 = Re.compile (R"\d+ \.\d*")

Software Development Catalog Specification:

A standardized directory structure can be better for control programs, which makes the program more readable.

"Project Catalog Specification" is also a "readability and maintainability" category, the design of a clear hierarchy of the directory structure is to achieve the following two points:

1. High readability: People unfamiliar with the project code can read the directory structure at a glance, know which program startup script is, where the test directory is, where the profile is, and so on, so that the project is very quickly understood

2. High maintainability: Once you have defined your organization's rules, you will be able to clearly know which file and code you are adding to. The advantage of this is that as the size of the code/configuration increases, the project structure is not cluttered and can still be well organized.

Usually a project will have the following directories:

Luffy # recommended Full lowercase

Log # Logs Directory

Conf/config/settings # configuration file directory

Libs/modules # Third-party library directory

Core/luffy # program code directory/core code directory

Docs # Document Library

README # Description of the software

setup.py # Quick Install

The entry of the boot script/program of the Bin # program

luffy_server.py

The wording of the Readme:

    1. Software positioning, the basic functions of the software.
    2. How to run your code: Installation environment, startup commands, and so on.
    3. Brief instructions for use.
    4. Code directory structure Description, more detailed can explain the basic principles of the software.
    5. FAQ's description.

The details of the relational directory specification can be found in: https://www.luffycity.com/python-book/di-4-zhang-python-ji-chu-2014-chang-yong-mo-kuai/ Ruan-jian-kai-fa-mu-lu-gui-fan.html

Python module: RE module, Software Development directory specification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.