Python crawler simulator Landing Campus Network-Beginner

Last Update:2017-04-21 Source: Internet

Author: User

Tags urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently with the students to learn the crawler when the Internet has a post, as if the campus network is not stable, with PY to do a simulation landing is very interesting, so I embarked on a road of no return ....

First on a campus network

Let's start by figuring out the principle of analog landing:

1: Server determines browser login using browser ID, need to simulate login

2: Need post account, password, and school ID

Python walk, I used the 2.7 version, written in notepad++, bound Python can run directly

Because it is a simulated web landing, you need to import urllib urllib2 Cookielib Library, the first two have a direct interface with the Web, Cookielib is used to handle cookies

Let's look at some of these library functions. A good Blog

Http://www.cnblogs.com/mmix2009/p/3226775.html

OK, start building a opener

Cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))

With Urllib2. Httocookieprocessor handles Cookiejar obtained cookies and is build_opener processed

Then build the header that needs to post, this address is not the address that we enter the account password, but the address that submits the data to be processed, at the time of landing with the browser grab:

Well, that's the URL on the right, and the last URL we submitted is that one. Let's take a look at his header.

Almost all of that, can be written on, can also only write server Authentication ua and so on

Data that needs to be submitted:

data={         "username": "xxxxxxxx", "      password": "xxxxx",           }post_data=urllib.urlencode (data)

Then it's post, using Requset (Url,post_data,header)

Req=urllib2. Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req)

Again open (req) put in the content, print a try whether successful.

And then..... Then find the failure to find the bug .....

Because it is the study of the Internet, that simple example only the user name and password, and this landing to choose the University ....

Well, then first find the source, the results did not find, from the header to find, sure enough in the cookie there is a schoo id=xxxx, yes, that's it, so in the data plus this, the results submitted later or failed. Finally found that the Username,password,school_ in the submitted data must be consistent with the name of the request in the size of the underline or something:

The final code (account password or something with xxxx instead):

Import urllibimport urllib2import cookielibdata={"userName": "xxxxxxxx", "Password": "xxxxx", "school_id": "X XXX "}post_data=urllib.urlencode (data) cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) headers={' Accept ': ' text/html, Application/xhtml+xml, IMAGE/JXR, */* ', ' Acc Ept-encoding ': ' gzip, deflate ', ' accept-language ': ' en-us, en; q=0.8, ZH-HANS-CN; q=0.5, Zh-hans; q=0.3 ', ' Connection ': ' keep-alive ', ' Host ': ' 139.198.3.98 ', ' Referer ': ' Http://139.198.3.98/sdj d/protalaction!logininit.action?wlanuserip=10.177.31.212&basip=124.128.40.39 ', ' User-Agent ': ' Mozilla/5.0 (Wi Ndows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/51.0.2704.79 safari/537.36 edge/14.14393 ', ' X-requested-wi Th ': ' XMLHttpRequest '} req=urllib2.       Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req)  Print Content.read (). Decode ("Utf-8")

Run it:

Preliminary success~ later in the deep one-step study

and ask Dalao to answer me from the notepad++ with the # Comment no effect ...

Python crawler simulator Landing Campus Network-Beginner

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More