Python crawler (requests), pythonrequests

Last Update:2016-08-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I believe that most of the people who started learning Python crawlers initially used urllib and urllib2. After that, I came into contact with the third-party library requests. requests can fully meet various http functions and is really easy to use: D

They say this:

"Requests is the only non-GMO Python HTTP library that can be securely used by humans. Requests allows you to send HTTP/1.1 Requests for pure natural and plant feeding without manual work. You do not need to manually add query strings for the URL or perform form encoding on POST data. The functions of the Keep-alive and HTTP connection pools are 100% automated, and all motivation comes from the urllib3 rooted in Requests ."

----- From official documentation (http://cn.python-requests.org/zh_CN/latest)

Run the "Pip Install Requests" command to Install pip (if Pip is installed)

What are you waiting? Import requests to join the luxury lunch

Let's take a look at several common methods and attributes:

1. requests. Session () so that the Session can be retained and cookie can be kept.

2. requests. get () to get a webpage. You can use the params parameter to send some data to get it.

d = {key1 : value1, key2 : value2 }requests.get(‘URL’, params=d)

You can also use the headers parameter to customize the request header during get.

h = {key1 : value1, key2 : value2 }requests.get(‘URL’, headers=d)

3. requests. post () sends post requests. Similarly, you can also send data (using the data parameter) and custom request headers (using the headers parameter) during post ).

Some common attributes:

Eg = requests. get () eg. text # The response content can be obtained, for example, the captured webpage eg. encoding = 'utf-8' # Sometimes garbled characters are returned, and the encoding is changed to make it display normally. According to the actual situation, the encoding such as UTF-8 and gb2312 is changed. content # You can obtain binary content, for example, capture the verification code at login and other non-character resources eg. cookies # You can view the currently saved cookies. status_code # You can view the HTTP status code (such as 200 OK and 404 Not Found) eg. url # view the url of the current request

For more details, see the official documentation (http://cn.python-requests.org/zh_CN/latest)

Well, you only need to know a little bit about crawler.

An interesting phenomenon: students go to a website called "Academic Affairs Office" when learning crawlers, haha. The crawlers here are also used as examples to log on to the academic affairs office of the school (Chengdu Information Engineering University ).

First, open the Academic Affairs Office in the browser, press F12 to open the "Developer tool", perform a normal login, and analyze the login data.

1. the logon page of the Academic Affairs Office is http: // 210.41.224.117/Login/xLogin/Login. asp.

2. Click the network in the developer tool. The address of the post data sent after Login is also http: // 210.41.224.117/Login/xLogin/Login. asp.

3. At the same time, the post data includes the following:

Parameter List
Form name	Example	Description
WinW	1366	Screen Resolution-Width
WinH	728	Screen Resolution-height
TxtId	2013215042	Student ID
TxtMM	123456	Password
Verifycode	123a	Verification Code
CodeKey	597564	Dynamic login code, which is visible in html files
Login	Check	Login type (fixed)
IbtnEnter. x	10	Login button click location
IbtnEnter. y	10	Login button click location

# Coding = utf-8import requestsimport reimport timeimport randomfrom PIL import Imageimport cStringIOdef login (username, password): headers = {# The Request Header request refreshes the verification code and uses 'host' when sending the post ': '2017. 41.224.117 ', 'user-agent': 'mozilla/5.0 (Windows NT 10.0; WOW64; rv: 48.0) Gecko/20100101 Firefox/123456', 'accept ': '*/*', 'Accept-Language ': 'zh-CN, zh; q = 0.8, en-US; q = 0.5, en; q = 0.3 ', 'Accept-encoding': 'gzip, deflate', 'Referer': 'Http: // 210.41.224.117/Login/xLogin/Login. asp ', 'connection': 'Keep-alive'} session = requests. session () step1 = session. get ('HTTP: // jxgl.cuit.edu.cn/JXGL/xs/MainMenu.asp') # connect to the student homepage twice to jump to the login page Step 1 = session. get ("http://jxgl.cuit.edu.cn/Jxgl/Xs/MainMenu.asp") get_osid_url = re. compile (r'content = "0; URL = (. *?) "> ') # Get the jump URL osid_url with OSid = get_osid_url.findall (step1.text) step2 = session. get (osid_url [0]) # Jump, point 1 get_codeKey = re. compile (r'var codeKey = \'(. *?) \ ';') # Obtain codeKey (parameter k) codeKey = get_codeKey.findall (step2.text) timeKey = str (time. time () [: 10] + str (random. randint (100,999) # generate the value of parameter t (timestamp + three random numbers) payload = {'K': codeKey [0], 't': timeKey} yzm_url = 'HTTP: // 210.41.224.117/Login/xLogin/yzmDvCode. asp 'yzmdata = session. get (yzm_url, params = payload, headers = headers) # refresh the verification code. Point 2 tempIm = cStringIO. stringIO (yzmdata. content) im = Image. open (tempIm) im. show () yzm = raw_input ('Please enter yzm: ') # enter post_data = {'winw': '000000', 'winh': '000000 ', 'txtid': username, 'txtmm': password, 'verifycode': yzm, 'codekey': codeKey [0], 'login': 'check', 'ibtnenter. x': 10, 'ibtnenter. y ': 10} post_url = 'HTTP: // 210.41.224.117/Login/xLogin/Login. asp 'step3 = session. post (post_url, data = post_data, headers = headers) # return sessioncuitJWC = login ('username', 'Password') con = cuitJWC. get ('HTTP: // jxgl.cuit.edu.cn/JXGL/xs/MainMenu.asp') con. encoding = 'gb2312' print con. text

Reprinted please indicate the source: http://www.cnblogs.com/lucky-pin/p/5806394.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler (requests), pythonrequests

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support