Objective:
- Environment configuration: WINDOWS64, python3.4
- Requests Library Basic operations:
1. Installation: Pip Install requests
2, Function: Use requests Send network request, can implement the same as the browser to send various HTTP requests to obtain the data of the website.
3. Command set operation:
Import Requests # importing requests Module R = Requests.get ("https://api.github.com/events") # Get a page # set timeout, Stop waiting for a response after the number of seconds set by timeout r2 = requests.get ("https://api.github.com/events", timeout=0.001) payload = {' Key1 ': ' value1 ', ' Key2 ': ' value2 '}r1 = Requests.get ("Http://httpbin.org/get", params=payload) print (r.url) # Print output urlprint (r.text) # Read the contents of the server response print (r.encoding) # Gets the current encoding print (r.content) # Request the response volume in bytes print (r.status_code) # Get Response status Code print (R.status_code = = Requests.codes.ok) # Query object with built-in status code print (r.headers) # Server response header print in a Python dictionary (r.headers[' Content-type ') # case insensitive, use any form of access to these response header fields print (r.history) # is a list of Response objects print (type (r)) # return request type
- BEAUTIFULSOUP4 Library Basic Operations:
1. Installation: Pip install BEAUTIFULSOUP4
2, Function: Beautiful Soup is a python library that extracts data from HTML or XML files.
3. Command set operation:
1 ImportRequests2 fromBs4ImportBeautifulSoup
3Html_doc ="""4 5 <body>6 <p class= "title" ><b>the dormouse ' s story</b></p>7 8 <p class= "story" >once upon a time there were three Little Sisters; and their names were9 <a href= "Http://example.com/elsie" class= "Sister " id= "Link1" >ELSIE</A>Ten <a href= "Http://example.com/lacie" class= "sister" id= "Link2" >Lacie</a> and One <a href= "Http://example.com/tillie" class= "Sister " id= "Link3" >Tillie</a>; A and they lived at the bottom of a well.</p> - - <p class= "story" >...</p> the """ - -SS = BeautifulSoup (Html_doc,"Html.parser") - Print(Ss.prettify ())#structure output in standard indent format + Print(Ss.title)# <title>the dormouse ' s story</title> - Print(Ss.title.name)#title + Print(ss.title.string)#The dormouse ' s story A Print(Ss.title.parent.name)#Head at Print(SS.P)#<p class= "title" ><b>the dormouse ' s story</b></p> - Print(ss.p['class'])#[' title '] - Print(SS.A)#<a class= "sister" href= "Http://example.com/elsie " id= "Link1" >Elsie</a> - Print(Ss.find_all ("a"))#[... ] in Print(Ss.find (id ="Link3"))#<a class= "sister" href= "Http://example.com/tillie " id= "Link3" >Tillie</a> - to forLinkinchSs.find_all ("a"): + Print(Link.get ("Link"))#get links to all <a> tags in a document - the Print(Ss.get_text ())#get all the text from the document
1 ImportRequests2 fromBs4ImportBeautifulSoup3 4Html_doc ="""5 6 <body>7 <p class= "title" ><b>the dormouse ' s story</b></p>8 <p class= "story" >once upon a time there were three Little Sisters; and their names were9 <a href= "Http://example.com/elsie" class= "Sister " id= "Link1" >ELSIE</A>Ten <a href= "Http://example.com/lacie" class= "sister" id= "Link2" >Lacie</a> and One <a href= "Http://example.com/tillie" class= "Sister " id= "Link3" >Tillie</a>; A and they lived at the bottom of a well.</p> - - <p class= "story" >...</p> the """
-Soup = BeautifulSoup (Html_doc,'Html.parser')#declaring BeautifulSoup objects -Find = Soup.find ('P')#use the Find method to find the first P-label - Print("find ' s return type is", type (find))#output return value type + Print("find ' s content is", find)#output the value obtained by Find - Print("find ' s Tag Name is", Find.name)#the name of the output label + Print("find ' s Attribute (Class) is", find['class'])#the class attribute value of the output label A at Print(find.string)#get the text content in a label - -Markup ="<b><!--Hey, buddy. Want to buy a used parser?--></b>" -Soup1 = BeautifulSoup (markup,"Html.parser") -Comment =soup1.b.string - Print(type (comment))#get content in comments
1 ImportRequests2 Importio3 ImportSYS4Sys.stdout = io. Textiowrapper (sys.stdout.buffer,encoding='GB18030')#Change the default encoding for standard output5 6R = Requests.get ('https://unsplash.com')#sends a GET request like the destination URL address, returning a response object7 8 Print(R.text)#R.text is a Web page HTML for HTTP response
Reference Links:
78537432
Http://www.cnblogs.com/Albert-Lee/p/6276847.html
78748531
The Requests+selenium+beautifulsoup of Python crawlers