Python crawler simulator Landing Campus Network-Beginner

Source: Internet
Author: User
Tags urlencode

Recently with the students to learn the crawler when the Internet has a post, as if the campus network is not stable, with PY to do a simulation landing is very interesting, so I embarked on a road of no return ....

First on a campus network

Let's start by figuring out the principle of analog landing:

1: Server determines browser login using browser ID, need to simulate login

2: Need post account, password, and school ID

Python walk, I used the 2.7 version, written in notepad++, bound Python can run directly

Because it is a simulated web landing, you need to import urllib urllib2 Cookielib Library, the first two have a direct interface with the Web, Cookielib is used to handle cookies

Let's look at some of these library functions. A good Blog

Http://www.cnblogs.com/mmix2009/p/3226775.html

OK, start building a opener

Cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))

With Urllib2. Httocookieprocessor handles Cookiejar obtained cookies and is build_opener processed

Then build the header that needs to post, this address is not the address that we enter the account password, but the address that submits the data to be processed, at the time of landing with the browser grab:

Well, that's the URL on the right, and the last URL we submitted is that one. Let's take a look at his header.

Almost all of that, can be written on, can also only write server Authentication ua and so on

Data that needs to be submitted:

data={         "username": "xxxxxxxx", "      password": "xxxxx",           }post_data=urllib.urlencode (data)

Then it's post, using Requset (Url,post_data,header)

Req=urllib2. Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req)

Again open (req) put in the content, print a try whether successful.

And then..... Then find the failure to find the bug .....

Because it is the study of the Internet, that simple example only the user name and password, and this landing to choose the University ....

Well, then first find the source, the results did not find, from the header to find, sure enough in the cookie there is a schoo id=xxxx, yes, that's it, so in the data plus this, the results submitted later or failed. Finally found that the Username,password,school_ in the submitted data must be consistent with the name of the request in the size of the underline or something:

The final code (account password or something with xxxx instead):

Import urllibimport urllib2import cookielibdata={"userName": "xxxxxxxx", "Password": "xxxxx", "school_id": "X XXX "}post_data=urllib.urlencode (data) cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) headers={' Accept ': ' text/html, Application/xhtml+xml, IMAGE/JXR, */* ', ' Acc Ept-encoding ': ' gzip, deflate ', ' accept-language ': ' en-us, en; q=0.8, ZH-HANS-CN; q=0.5, Zh-hans; q=0.3 ', ' Connection ': ' keep-alive ', ' Host ': ' 139.198.3.98 ', ' Referer ': ' Http://139.198.3.98/sdj d/protalaction!logininit.action?wlanuserip=10.177.31.212&basip=124.128.40.39 ', ' User-Agent ': ' Mozilla/5.0 (Wi Ndows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/51.0.2704.79 safari/537.36 edge/14.14393 ', ' X-requested-wi Th ': ' XMLHttpRequest '} req=urllib2.       Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req)  Print Content.read (). Decode ("Utf-8") 

Run it:

Preliminary success~ later in the deep one-step study

and ask Dalao to answer me from the notepad++ with the # Comment no effect ...

Python crawler simulator Landing Campus Network-Beginner

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.