Web crawler, requests request

Source: Internet
Author: User

Requests request, is to use Yhthon requests Module Simulation browser request, return HTML source code

There are two types of simulated browser requests, one is a request that does not require user login or authentication, and a request that requires user login or authentication.

A request that does not require user login or authentication

This is relatively simple, directly using the requests module to send a request to get HTML source code

#!/usr/bin/env python#-*-Coding:utf8-*-import requests     #导入模拟浏览器请求模块http =requests.get (url= "/http www.iqiyi.com/")     #发送http请求http. Encoding =" Utf-8 "                             #http请求编码neir = Http.text                                    #获取http字符串代码print (neir)

Get HTML source code

<! DOCTYPE html>

Ii. requests requiring user login or authentication

To get this kind of page, we first want to understand the entire login process, the general login process is that when the user first access, will automatically generate a cookie file in the browser, when the user enters the login information will carry the generated cookie file, if the login information will give the cookie

Authorization, after authorization, to access the page that needs to be signed in after the authorization cookie is allowed

1, first visit the home page, and then see if there is automatically generated cookies

#!/usr/bin/env python#-*-Coding:utf8-*-import requests     #导入模拟浏览器请求模块 # # # 1, before logging in to visit the home page, get cookiei1 = Requests.get (    url= "http://dig.chouti.com/",    headers={' Referer ': ' http://dig.chouti.com/'}) i1.encoding = "Utf-8"                               # HTTP request Encoding I1_cookie = I1.cookies.get_dict () print (I1_cookie)                                    #返回获取到的cookie # return: {' Jsessionid ': ' aaatztkp-kaglbx-t6r0v ', ' gpsd ': ' C227f059746c839a28ab136060fe6ebe ', ' route ': ' f8b4f4a95eeeb2efcff5fd5e417b8319 '}

You can see that a cookie has been generated, stating that if the login information is correct, the background will give the cookie authorization, and later access the page that needs to be signed in to carry the authorized cookie.

2. Let the program automatically login to the authorization cookie

First we use the browser to access the login page, randomly enter the login password and account, get the login page URL, and login required fields

Carry a cookie login authorization

#!/usr/bin/env python#-*-Coding:utf8-*-import Requests #导入模拟浏览器请求模块 # # # 1, before logging in to visit the first page, get cookiei1 = Requests.get (                               Url= "http://dig.chouti.com/", headers={' Referer ': ' http://dig.chouti.com/'}) i1.encoding = "Utf-8" #http请求编码i1_cookie = I1.cookies.get_dict () print (I1_cookie) #返回获取到的cookie # return: {' Jsessionid ': ' aaatztkp-kaglbx-t6r0v ', ' gpsd ': ' C227f059746c839a28ab136060fe6ebe ', ' Route ': ' f8b4f4a95eeeb2efcff5fd5e417b8319 '}### 2, user login, carry the last cookie, background to the random characters in the cookie authorization i2 = Requests.post (url= "/http Dig.chouti.com/login ", #登录url data={#登录字段 ' phone ':" 86152 84816568 ", ' Password ':" 279819 ", ' Onemonth ':"}, headers={' Referer ': ' http://dig.chouti.com/'}, CO                                       Okies=i1_cookie #携带cookie) i2.encoding = "Utf-8" DLUXXI = I2.textprint (DLUXXI) #查看登录后服务器的响应 # return: {"result": {"code": "9999", "Message": "", "data": {"Complatereg": "0", "Destjid": "Cdu_50072007463"}}} Login succeeded 

3, after the successful login, the background has been granted to the cookie, so that we visit the need to log on the page, to carry this cookie, such as access to the personal center

#!/usr/bin/env python#-*-Coding:utf8-*-import Requests #导入模拟浏览器请求模块 # # # 1, before logging in to visit the first page, get cookiei1 = Requests.get (                               Url= "http://dig.chouti.com/", headers={' Referer ': ' http://dig.chouti.com/'}) i1.encoding = "Utf-8" #http请求编码i1_cookie = I1.cookies.get_dict () print (I1_cookie) #返回获取到的cookie # return: {' Jsessionid ': ' aaatztkp-kaglbx-t6r0v ', ' gpsd ': ' C227f059746c839a28ab136060fe6ebe ', ' Route ': ' f8b4f4a95eeeb2efcff5fd5e417b8319 '}### 2, user login, carry the last cookie, background to the random characters in the cookie authorization i2 = Requests.post (url= "/http Dig.chouti.com/login ", #登录url data={#登录字段 ' phone ':" 86152 84816568 ", ' Password ':" 279819 ", ' Onemonth ':"}, headers={' Referer ': ' http://dig.chouti.com/'}, CO                                       Okies=i1_cookie #携带cookie) i2.encoding = "Utf-8" DLUXXI = I2.textprint (DLUXXI) #查看登录后服务器的响应 # return: {"result": {"code": "9999", "Message": "", "data": {"Complatereg": "0", "Destjid": "Cdu_50072007463"}}} Login succeeded # # 3, access the page that needs to be logged in to view With a licensed cookie access Shouquan_cookie = I1_cookiei3 = Requests.get (url= "HTTP://DIG.CHOUTI.COM/USER/LINK/SAVED/1", headers= {' Referer ': ' http://dig.chouti.com/'}, Cookies=shouquan_cookie #携带着授权后的cookie访问) i3.encoding = "UT F-8 "Print (I3.text) #查看需要登录才能查看的页面

Get the HTML source that needs to login page successfully

All code

Get () method, send GET request
Encoding property, setting the request encoding
Cookies.get_dict () Get cookies
Post () Send POST request
Text Get server response information

#!/usr/bin/env python#-*-Coding:utf8-*-import Requests #导入模拟浏览器请求模块 # # # 1, before logging in to visit the first page, get cookiei1 = Requests.get (                               Url= "http://dig.chouti.com/", headers={' Referer ': ' http://dig.chouti.com/'}) i1.encoding = "Utf-8" #http请求编码i1_cookie = I1.cookies.get_dict () print (I1_cookie) #返回获取到的cookie # return: {' Jsessionid ': ' aaatztkp-kaglbx-t6r0v ', ' gpsd ': ' C227f059746c839a28ab136060fe6ebe ', ' Route ': ' f8b4f4a95eeeb2efcff5fd5e417b8319 '}### 2, user login, carry the last cookie, background to the random characters in the cookie authorization i2 = Requests.post (url= "/http Dig.chouti.com/login ", #登录url data={#登录字段 ' phone ':" 86152 84816568 ", ' Password ':" 279819 ", ' Onemonth ':"}, headers={' Referer ': ' http://dig.chouti.com/'}, CO                                       Okies=i1_cookie #携带cookie) i2.encoding = "Utf-8" DLUXXI = I2.textprint (DLUXXI) #查看登录后服务器的响应 # return: {"result": {"code": "9999", "Message": "", "data": {"Complatereg": "0", "Destjid": "Cdu_50072007463"}}} Login succeeded # # 3, access the page that needs to be logged in to view With a licensed cookie access Shouquan_cookie = I1_cookiei3 = Requests.get (url= "HTTP://DIG.CHOUTI.COM/USER/LINK/SAVED/1", headers= {' Referer ': ' http://dig.chouti.com/'}, Cookies=shouquan_cookie #携带着授权后的cookie访问) i3.encoding = "UT F-8 "Print (I3.text) #查看需要登录才能查看的页面

Note: If the login requires a verification code, then you need to do image processing, according to the Verification code picture, identify the verification code, write the verification code to the login field

Web crawler, requests request

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.