Python Crawler's requests module

Source: Internet
Author: User

I. Sign-in cases

A. Find Autohome news title link picture Write Local

Import Requests fromBS4 Import Beautifulsoupimport uuidresponse= requests.Get(    'http://www.autohome.com.cn/news/') response.encoding='GBK'Soup= BeautifulSoup (Response.text,'Html.parser'# HTML will be converted to Object tag= Soup.find (id='auto-channel-lazyload-article') Li_list= Tag.find_all ('Li') forIinchli_list:a= I.find ('a')    ifa:print (a.attrs.Get('href')) txt= A.find ('H3'). Text print (TXT) img_url= TXT = A.find ('img'). Attrs.Get('src') print (img_url) img_response= requests.Get(url=img_url) file_name= str (UUID.UUID4 ()) +'. jpg'with open (file_name,'WB') asf:f.write (img_response.content)
Use the BeautifulSoup module to find the label

B. Drawer likes get page and login will get GPSD will use get page gpsd instead of login gpsd

Import requests# First get page R1= requests.Get('http://dig.chouti.com/') R1_cookies=r1.cookies.get_dict () #登录post_dict= {    "Phone":"8615131255089",    "Password":"Woshiniba",    "Onemonth":"1"}R2=requests.post (URL="Http://dig.chouti.com/login", the data=post_dict, Cookies=r1_cookies) r2_cookies=r2.cookies.get_dict () # Access other pages R3=requests.post (URL="http://dig.chouti.com/link/vote?linksId=13921091", Cookies={'GPSD': r1_cookies['GPSD']}) print (R3.text)
Drawer page (GPSD)

C. Login Githup carry a cookie Login

Import Requests fromBS4 Import BeautifulSoupr1= requests.Get('Https://github.com/login') S1= BeautifulSoup (R1.text,'Html.parser') # Get Csrf_tokentoken= S1.find (name='input', attrs={'name':"Authenticity_token"}).Get('value') R1_cookie_dict=r1.cookies.get_dict () # sends the username password token to the server POSTR2=Requests.post ('https://github.com/session', the data={    'Commit':' Sign In',    'UTF8':'?',    'Authenticity_token': Token,'Login':'[email protected]',    'Password':'alex3714'}, Cookies=r1_cookie_dict) # get login after cookier2_cookie_dict=r2.cookies.get_dict () #合并登录前的cookie和登录后的cookiecookie_dict={}cookie_dict.update (r1_cookie_dict) cookie_dict.update (r2_cookie_dict) R3= requests.Get(URL='https://github.com/settings/emails', Cookies=cookie_dict) print (R3.text)
View CodeTwo. Requests parameters
-method: How to submit-URL: Submit Address-params: Parameters passed in the URL, GET-data: Information passed in the request body-JSON data passed in the request body-Headers Request Header-Cookies and Cookies-Files upload file-Auth Basic cognition (add encrypted user name and password in headers)-Timeout request and response for supermarket time-allow_redirects whether to allow redirection-Proxies Agent-verify whether to ignore certificates-Cert certificate file-Stream Village under a large-session: Used to save client history access information

A. File Send files

import requestsrequests.post (URL='XXX', Filter={        'name1': Open ('a.txt','RB'), #名称对应的文件对象'name2': ('Bbb.txt', Open ('B.txt','RB')) #表示上传到服务端的名称为 bbb.txt})
View Code

B. AUTH Certification

#配置路由器访问192. 168.0. 1 Pop-up popup window, enter user name, password Click login is not a form form submission, is the basic login box, this box will enter the user name and password encrypted in the request hair sent over
import requestsrequests.post (URL='XXX', Filter={        'name1': Open ('a.txt','RB'), #名称对应的文件对象'name2': ('Bbb.txt', Open ('B.txt','RB')) #表示上传到服务端的名称为 bbb.txt})
View Code

C. Stream Flow

#如果服务器文件过大, Cycle download Def param_stream ():     = requests. Get ('http://127.0.0.1:8000/test/', stream=True)    print (ret.content)    Ret.close ()    from  contextlib import closing    # with closing (requests. Get ('http://httpbin.org/get' as R:    # # This handles the response.      for inch r.iter_content ():    # Print (i)
View Code

D. Different examples of session and Django: simplifying drawer likes

Import Requests Session=requests. Session () # # #1, first login to any page, get cookie I1= Session.Get(url="Http://dig.chouti.com/help/service")    ### 2, the user logs in, carries the last cookie, and the GPSD in the cookie is authorized in the background i2=session.post (URL="Http://dig.chouti.com/login", the data={            'Phone':"8615131255089",            'Password':"xxxxxx",            'Onemonth':""}) i3=session.post (URL="http://dig.chouti.com/link/vote?linksId=8589623",) print (I3.text)
View Code

Python Crawler's requests module

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.