One. Request Library
ImportJSONImportRequests fromIoImportbytesio# shows various functions equivalent to API#Print (dir (requests)) URL='http://www.baidu.com'R=requests.get (URL)Print(R.text)Print(R.status_code)Print(r.encoding)
results:
# pass parameters: not as Http://aaa.com?pageId=1&type =content params = { " k1 " : " v1 ", " k2 ": " v2 " }r = Requests.get ( " http://httpbin.org/get " Span style= "COLOR: #800000" > " Print (r.url) result:
# binary data # r = Requests.get (' http://i-2.shouji56.com/2015/2/11/ 23dab5c5-336d-4686-9713-ec44d21958e3.jpg ') # Image = Image.open (Bytesio (r.content)) # Image.Save (' meinv.jpg ') # JSON processing Span style= "COLOR: #000000" >r = Requests.get ( " https://github.com/timeline.json " ) print ( Type (R.json)) print (r.text)
result:
#Raw Data processing#Streaming Data write- inR = Requests.get ('http://i-2.shouji56.com/2015/2/11/23dab5c5-336d-4686-9713-ec44d21958e3.jpg', stream =True) with open ('meinv2.jpg','wb+') as F: forChunkinchR.iter_content (1024): F.write (chunk)#Submit Formform= {'username':'User','Password':'Pass'}r= Requests.post ('Http://httpbin.org/post', data =form)Print(R.text)
Result: The parameter is submitted as a form, so the parameter is placed in the form parameter
= Requests.post ('http://httpbin.org/post', data = json.dumps (form))Print (R.text)
Result: The parameter is not submitted as a form form, so it is placed in the JSON field
# Cookies 'http://www.baidu.com'== r.cookies# a cookie is actually a dictionary for inch cookies.get_dict (). Items (): Print (k, v) result: A cookie is actually a key-value pair
= {'C1':'v1'C2 ' v2 ' = requests.get ('http://httpbin.org/cookies', cookies = cookies) Print(r.text) results:
# redirect and redirect History = requests.head ('http://github.com', allow_redirects = True)Print (r.url)print(r.status_code)print(r.history) results: Directed by 301
# # Agent ## proxies = {' http ': ',,, ', ' https ': ' ... '} # r = Requests.get (' ... ', proxies = proxies)
Two. BeautifulSoup Library
HTML: examples are as follows
<HTML><Head><title>The Dormouse ' s story</title></Head><Body><Pclass= "title"name= "Dromouse"><b>The Dormouse ' s story</b></P><Pclass= "Story">Once Upon a time there were three little sisters; and their names were<ahref= "Http://example.com/elsie"class= "Sister"ID= "Link1"><!--Elsie -</a>,<ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Lacie</a> and<ahref= "Http://example.com/tillie"class= "Sister"ID= "Link3">Tillie</a>; and they lived at the bottom of a well.</P><Pclass= "Story">...</P>
The parsing code is as follows:
from Import = BeautifulSoup (open ('test.html'))
#使html文本更加结构化 # print (Soup.prettify ()) # Tag Print (Type (soup.title))
Result: a class of BS4
Print (Soup.title.name)
Print (Soup.title)
The results are as follows:
# String Print (Type (soup.title. String)print(soup.title.string) results as follows: Only the contents of the label are displayed
# Comment Print (Type (soup.a.string)) Print (soup.a.string)
Result: Displays the contents of the note, so it is sometimes necessary to determine whether the obtained content is not a comment
## " " for inch soup.body.contents: Print (item.name) result: Body has three item below
# CSS Query Print (Soup.select ('. Sister'))
Result: The style selector returns all content with a style result as a list
Print (Soup.select ('#link1'))
Result: ID Selector, select content with ID equal to Link1
Print (Soup.select ('head > title')) results:
= Soup.select ('a') for in a_s: print(a )
Result: Tag Selector, select all the A label's
Ongoing updates .... , you are welcome to pay attention to my public number lhworld.
Python crawler Knowledge Point two