Recently with the students to learn the crawler when the Internet has a post, as if the campus network is not stable, with PY to do a simulation landing is very interesting, so I embarked on a road of no return ....
First on a campus network
Let's start by figuring out the principle of analog landing:
1: Server determines browser login using browser ID, need to simulate login
2: Need post account, password, and school ID
Python walk, I used the 2.7 version, written in notepad++, bound Python can run directly
Because it is a simulated web landing, you need to import urllib urllib2 Cookielib Library, the first two have a direct interface with the Web, Cookielib is used to handle cookies
Let's look at some of these library functions. A good Blog
Http://www.cnblogs.com/mmix2009/p/3226775.html
OK, start building a opener
Cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
With Urllib2. Httocookieprocessor handles Cookiejar obtained cookies and is build_opener processed
Then build the header that needs to post, this address is not the address that we enter the account password, but the address that submits the data to be processed, at the time of landing with the browser grab:
Well, that's the URL on the right, and the last URL we submitted is that one. Let's take a look at his header.
Almost all of that, can be written on, can also only write server Authentication ua and so on
Data that needs to be submitted:
data={ "username": "xxxxxxxx", " password": "xxxxx", }post_data=urllib.urlencode (data)
Then it's post, using Requset (Url,post_data,header)
Req=urllib2. Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req)
Again open (req) put in the content, print a try whether successful.
And then..... Then find the failure to find the bug .....
Because it is the study of the Internet, that simple example only the user name and password, and this landing to choose the University ....
Well, then first find the source, the results did not find, from the header to find, sure enough in the cookie there is a schoo id=xxxx, yes, that's it, so in the data plus this, the results submitted later or failed. Finally found that the Username,password,school_ in the submitted data must be consistent with the name of the request in the size of the underline or something:
The final code (account password or something with xxxx instead):
Import urllibimport urllib2import cookielibdata={"userName": "xxxxxxxx", "Password": "xxxxx", "school_id": "X XXX "}post_data=urllib.urlencode (data) cookie=cookielib. Cookiejar () Opener=urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) headers={' Accept ': ' text/html, Application/xhtml+xml, IMAGE/JXR, */* ', ' Acc Ept-encoding ': ' gzip, deflate ', ' accept-language ': ' en-us, en; q=0.8, ZH-HANS-CN; q=0.5, Zh-hans; q=0.3 ', ' Connection ': ' keep-alive ', ' Host ': ' 139.198.3.98 ', ' Referer ': ' Http://139.198.3.98/sdj d/protalaction!logininit.action?wlanuserip=10.177.31.212&basip=124.128.40.39 ', ' User-Agent ': ' Mozilla/5.0 (Wi Ndows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/51.0.2704.79 safari/537.36 edge/14.14393 ', ' X-requested-wi Th ': ' XMLHttpRequest '} req=urllib2. Request (' http://139.198.3.98/sdjd/userAction!login.action ', post_data,headers) Content=opener.open (req) Print Content.read (). Decode ("Utf-8")
Run it:
Preliminary success~ later in the deep one-step study
and ask Dalao to answer me from the notepad++ with the # Comment no effect ...
Python crawler simulator Landing Campus Network-Beginner