Python Full Stack development * 30 Knowledge Point Summary * 180713

Source: Internet
Author: User
Tags html tags

  2 re module
I. Regular expression online test tool http://tool.chinaz.com/regex/

(a). * Usage:
. is any character
* is 0 to infinite length
? non-greedy mode.
together is to take as few as possible any character, generally do not write so alone, he mostly used in:. *?x
is to take any length of the preceding character until an X appears
(b). Question mark "?" Four ways to use
1. quantifier, repeat 0 or one time
2. Non-greedy matching (lazy matching) symbol (. *?)
3.: Group start plus?: Indicates that ungroup is preferred.
4.?p: Grouping named HTML tags used in prophecy.
two. Re module common methods
Base Lookup
1.findall Grouping Priority
Ret=re.findall (R "(\d+\.? \d+) "," 123.546 ")
print (ret)
Print (Ret.remove (""))
#findall的优先级问题
ret=re.findall (' www. ( baidu|oldboy). com ', ' www.oldboy.com ')
print (ret) #[' Oldboy ') This is because FindAll will first return the contents of the matching result set, and if you want to match the result, cancel the permission
ret=re.findall (' www. (?: baidu|oldboy). com ', ' www.oldboy.com ')
print (ret) #[' www.oldboy.com '] Note: Grouping begins with "?:" to ungroup the priority level.

2.search (Group)
The function finds a pattern match within the string until the first match is found and then returns an object that contains the matching information that can be
a matching string is obtained by calling the group () method, and none is returned if the string does not match.
ret=re.search ("\d+", "4huhi67377")
print (Ret.group ()) # 4
ret=re.search ("\d+", "4888huhi67377")
print (Ret.group ()) #4888
3.match (Group)
Ret=re.match ("\d", "4huhi67377") no matter what is in the #match, the default is to add "^" to the regular front .
print (Ret.group ())
string Processing
4.split Group retention priority "regular" "(regular) "
ret=re.split ("(\d+)", "ghgh689jhhkjkj888hjh9777") # is reserved with "\d+" Cut Strings and "(regular)" groupings.
print (ret) #[' ghgh ', ' 689 ', ' jhhkjkj ', ' 888 ', ' hjh ', ' 9777 ', ']
ret=re.split ("\d+", "ghgh689jhhkjkj888hjh9777")
print (ret) #[' Ghgh ', ' jhhkjkj ', ' hjh ', ']
5.sub Replacement ("Regular", "Replace target value", "string", 2)
ret=re.sub ("\d+", "Male God", "alex1000wusir666")
Print (ret) results Alex Wusir male god
ret=re.sub ("\d+", "Male God", "alex1000wusir666", 1)
Print (ret) results alex male God wusir666

6.subn
ret=re.subn ("\d+", "Male God", "alex1000wusir666")
Print (ret) results (' Alex male god Wusir ', 2)
Code Optimization
7.compile
obj=re.compile ("\d{4}")
ret=obj.search ("676767hghjj787878gjggu")
print (Ret.group ()) #结果 6767
Ret=obj.findall ("Hghjj787878gjggu")
print (Ret.group ()) Results 6767
Ret=obj.match ("676767hghjj787878gjggu")
print (Ret.group ()) #6767
8.finditer iterative function
ret=re.finditer ("\d+", "ggjgu65565765hjhjk767")
For i in RET:
print (I.group ()) #65565765 767
< two >
print (ret) # <callable_iterator object at 0x00000278077385c0>
print (Next (ret). Group ()) # 65565765
print (Next (ret). Group ()) # 767
three. Comprehensive exercises and extensions
1. Matching tags
(1). Normal edition
ret = Re.search ("<\w+>\w+</\w+>", "
print (Ret.group ()) #
(2). Group Name Edition
can also be used in groups? P<name> in the form of a group name, the matching results can be directly used group (' name ') to get the corresponding value
ret = Re.search ("< (? p<tag_name>\w+) >\w+</(? P=tag_name) > ","
? P<tag_name> from the name? P=tag_name Using group names
Print (Ret.group ("tag_name")) # H1
print (Ret.group ()) #
(3) group index starting from 1
If you do not name the group, you can also use the "\ Serial number" to find the corresponding group, indicating that the content you are looking for is consistent with the previous group content .
The resulting matching results can be obtained directly from the group (serial number) to the corresponding value
ret = Re.search (R "< (\w+) >\w+</\1>", "
print (Ret.group ()) #
Print (Ret.group (1)) # H1
2. Matching integers and decimals
Ret=re.findall (r "-?\d+\.\d+|-?\d+", "1-2* (60+ ( -40.35/5)-( -4*3))")
print (ret) # [' 1 ', '-2 ', ' 60 ', '-40.35 ', ' 5 ', '-4 ', ' 3 '] decimal and integer are taken
Ret=re.findall (r "-?\d+\.\d+| ( -?\d+) "," 1-2* (60+ ( -40.35/5)-( -4*3) )
print (ret) # [' 1 ', '-2 ', ' 60 ', ' ', ' 5 ', '-4 ', ' 3 '] take integers only
3. Digital matching
(1). Match the mailbox for each line in a text
http://blog.csdn.net/make164492212/article/details/51656638
Regular expression: [\w:\./]{1,}
Verification: Ret=re.findall ("[\w:\./]{1,}", "http://blog.csdn.net/make164492212/article/details/51656638")
print (ret) # [' http://blog.csdn.net/make164492212/article/details/51656638 ']
(2). Match the time string of each line in a text, for example: ' 1990-07-12 '; ^[1-9][0-9]{1,}\-[0-1][0-9]\-[0-3][0-9]
Take out 1 year of 12 months respectively # (^ (0?[ 1-9]|1[0-2])
one months of 31 days # ^ ((0?[ 1-9]) | ((1|2) [0-9]) |30|31) $
(3) matching QQ [1-9][0-9]{4,}
(4) floating point number ^ (-?\d+) (\.\d+)? $
Four. Flags has a number of optional values
Re. I (IGNORECASE) ignores capitalization, which is the complete notation in parentheses
Re. M (MULTILINE) multiline mode, change ^ and $ behavior
Re. S (Dotall) points can match any character, including line breaks
Re. L (LOCALE) do localization recognition matching, representing special character set \w, \w, \b, \b, \s, \s dependent on current environment, deprecated
Re. U (Unicode) uses \w \w \s \s \d \d Using character attributes that depend on the UNICODE definition. The flag is used by default in Python3
Re. X (VERBOSE) verbose mode, in which pattern strings can be multiline, ignore whitespace characters, and add comments
job: implementation can calculate similar
1-2 * ((60-30 + ( -40/5) * (9-2*5/3 + 7/3*99/4*2998 +10 * 568/14))-( -4*3)/(16-3*2)) and other similar formula calculator program
Crawler Exercises:
Import Requests

Import re
Import JSON

def getpage (URL):

response=requests.get (URL)
return Response.text

def parsepage (s):

com=re.compile (' <div class= "item" >.*?<div class= "pic" >.*?<em .*?> (? p<id>\d+). *?<span class= "title" > (? P<title>.*?) </span> '
'. *?<span class= "Rating_num" .*?> (? P<rating_num>.*?) </span>.*?<span> (? P<comment_num>.*?) Evaluation </span> ', re. S)

Ret=com.finditer (s)
For i in RET:
Yield {
"id": I.group ("id"),
"title": I.group ("title"),
"Rating_num": I.group ("Rating_num"),
"Comment_num": I.group ("Comment_num"),
}

def main (num):

url= ' https://movie.douban.com/top250?start=%s&filter= '%num
response_html=getpage (URL)
ret=parsepage (response_html)
print (ret)
F=open ("Move_info7", "a", encoding= "UTF8")

For obj in ret:
print (obj)
data=json.dumps (Obj,ensure_ascii=false)
f.write (data+ "\ n")

if __name__ = = ' __main__ ':
count=0
For I in Range (Ten):
Main (count)
count+=25

Python Full Stack development * 30 Knowledge Point Summary * 180713

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.