Python analog login fetch and process send POST request and head data

Source: Internet
Author: User
Tags python script


Today's article discusses how to get and process post request and head data.
Tools:
Firefox browser and Firebug plugin.

(Other such as Httpfox,live HTTP head, Fiddler,httpwatch also line)
1. View analysis landing page HTML code to see if there is an IFRAME
When we write an automatic login script, we first need to analyze the POST request and head data, as well as the post URL. Here, we first open the Firebug to start monitoring, and then open the landing page of the site:/indexpage/index.aspx. Using Firebug to view the HTML code of the page, the login window with the input account number and password is an IFRAME. In fact, in the first article, experienced will find that the post URL and landing URL is too different. I also because I am too lazy to look at the source and analysis, directly written, run the time, in fact, has landed in, but because it is the HTML code returned by the Python script iframe will not appear, so the General secretary verification failed, thinking that their code has problems. Therefore, it is recommended to write before the page to analyze whether there is an IFRAME, which can reduce the steps and code. I am here as if I did not analyze the source login address, or in/indexpage/index.aspx this address to get the post data.
2. Enter Password login website, view post data and head
Landing site, we can through Firebug's "Network (Network)", we need to find the post address and request and response data. The following figure in the header information we can see the POST request head and response head information, see figure I:
Post's address:/guopeiadmin/login/login.aspx

Figure I
In post, we can see the data needed for post, as shown in Figure 2 (account and password I have removed):
Visible, the value of the cookie needs to be invoked in the POST request head information. And we open the website login page, ready to log in, will get a lot of cookies, can be seen from the "Cookies" tab in the Firebug, there are some cookies, as shown in Figure 3:
Figure Three
So many cookies, that's what's needed.
This is very simple, firebug both to view cookies and the function of the deletion of cookies, the right mouse button on a cookie, select Delete. And then log on, and if this cookie is not necessary, then we can log in successfully. So repeat, you know what is necessary. Here, if the cookie domain is not the URL of the site we want to log on, then we can be sure that the cookies are not necessary, as shown in the above, looyu_id, looyu_23945.
Similarly, the post of the request head is not necessary, such as X-kn-appid, in fact, we can also first screen some irrelevant page elements, with ABP directly shielded doyoo.net and looyu.net two domain names, Meanwhile, the cookies on both sites are blocked. In this way, we will not be misled by the cookies of the two websites, then test the landing, you can eliminate multiple cookies in one test, reduce time and steps.
Post request head There is also a troubling place , which is that it contains the value of cookies, as shown in Figure 1. This requires that we process the value of the Cookiejar, because the Cookiejar format does not meet the requirements, the Cookiejar format is as follows:
<_mozillacookiejar.mozillacookiejar[<cookie. aspxanonymous=vlho9jxszwekaaaaodc4n
TCWYTQTYWFINY00MZG5LTHKNZETNJYXYJCXNMM3NZDK-H-RHMNHR0FVC7UTHULZ8QUFBOU1 for Zhuz
Hou2013.feixuelixm.teacher.com.cn/>, <cookie Asp.net_sessionid=reg0ik55hffagzjab
Wo3xu45 for Zhuzhou2013.feixuelixm.teacher.com.cn/>, <cookie Feixuelixmweb=r-fei
xuelixm_7.86 for Zhuzhou2013.feixuelixm.teacher.com.cn/>]>
The value of the "Cookie" in the POST request head is in the following format:
Cookies:. Aspxanonymous=vlho9jxszwekaaaaodc4ntcwytqtywfiny00mzg5lthknzetnjyxyjcx
NMM3NZDK-H-RHMNHR0FVC7UTHULZ8QUFBOU1; asp.net_sessionid=reg0ik55hffagzjabwo3xu45;
feixuelixmweb=r-feixuelixm_7.86
#打印未处理的过的cookies
Print Cookiejar
cookies = '
#打印每个cookies的项目和值, finishing format
For index, cookies in Enumerate (Cookiejar):
print ' [', Index, '] ';
Print cookie.name;
Print Cookie.value;
Print "###########################"
cookies = cookies+cookie.name+ "=" +cookie.value+ ";";
Print "###########################"
Cookie = cookies[:-1]
#打印整理的好的cookie
Print "Cookies:", cookies
So the head request is:


headers = {
' Accept ': ' text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 ',
' Accept-language ': ' zh-cn,en-us;q=0.8,zh;q=0.5,en;q=0.3 ',
' accept-encoding ': ' gzip, deflate ',
' Host ': ' zhuzhou2013.feixuelixm.teacher.com.cn ',
' Cookies ': cookies,
' User-agent ': ' mozilla/5.0 (Windows NT 5.1; rv:29.0) gecko/20100101 firefox/29.0 ',
' Referer ': ' http://zhuzhou2013.feixuelixm.teacher.com.cn/GuoPeiAdmin/Login/Login.aspx ',
# ' Content-type ': ' application/x-www-form-urlencoded ',
# ' Content-length ': 474,
' Connection ': ' Keep-alive '
}
Python Analog Landing II: Getting and processing post request and head data
Aug
12
2014
Author: Big step release: 2014-08-12 16:55 Category: Python, programming 1 reviews


The article "Python Analog Landing One: Authentication code and cookies in sync with the idea of", I verified the process of automatic login to cookies and verify the code how to sync the problem.
Today's article discusses how to get and process post request and head data.
Tools:
Firefox browser and Firebug plugin.
(Other such as Httpfox,live HTTP head, Fiddler,httpwatch also line)
1. View analysis landing page HTML code to see if there is an IFRAME
When we write an automatic login script, we first need to analyze the POST request and head data, as well as the post URL. Here, we first open the Firebug to start monitoring, and then open the landing page of the site: http://zhuzhou2013.feixuelixm.teacher.com.cn/IndexPage/Index.aspx. Using Firebug to view the HTML code of the page, the login window with the input account number and password is an IFRAME. In fact, in the first article, experienced will find that the post URL and landing URL is too different. I also because I am too lazy to look at the source and analysis, directly written, run the time, in fact, has landed in, but because it is the HTML code returned by the Python script iframe will not appear, so the General secretary verification failed, thinking that their code has problems. Therefore, it is recommended to write before the page to analyze whether there is an IFRAME, which can reduce the steps and code. I am here as if I did not analyze the source login address, or in http://zhuzhou2013.feixuelixm.teacher.com.cn/IndexPage/Index.aspx this address to get the post data.
2. Enter Password login website, view post data and head
Landing site, we can through Firebug's "Network (Network)", we need to find the post address and request and response data. The following figure in the header information we can see the POST request head and response head information, see figure I:
Post's address: http://zhuzhou2013.feixuelixm.teacher.com.cn/GuoPeiAdmin/Login/Login.aspx
Figure I
In post, we can see the data needed for post, as shown in Figure 2 (account and password I have removed):
Visible, the value of the cookie needs to be invoked in the POST request head information. And we open the website login page, ready to log in, will get a lot of cookies, can be seen from the "Cookies" tab in the Firebug, there are some cookies, as shown in Figure 3:
Figure Three
So many cookies, that's what's needed.
This is very simple, firebug both to view cookies and the function of the deletion of cookies, the right mouse button on a cookie, select Delete. And then log on, and if this cookie is not necessary, then we can log in successfully. So repeat, you know what is necessary. Here, if the cookie domain is not the URL of the site we want to log on, then we can be sure that the cookies are not necessary, as shown in the above, looyu_id, looyu_23945.
Similarly, Post's request head has part is also unwanted, such as X-firelogger,x-kn-appid,x-fsn,x-firelogger and so on, are unnecessary, because this may be the website estimate confuses us, It may also be interference from other elements of the Web page. These can also be eliminated by writing code to test. Of course, generally start to write as much as possible.

In fact, we can also first screen some irrelevant page elements, with ABP directly shielded doyoo.net and looyu.net two domain names, while shielding cookies from both websites. In this way, we will not be misled by the cookies of the two websites, then test the landing, you can eliminate multiple cookies in one test, reduce time and steps.

Post request head There is also a troubling place, which is that it contains the value of cookies, as shown in Figure 1. This requires that we process the value of the Cookiejar, because the Cookiejar format does not meet the requirements, the Cookiejar format is as follows:
<_mozillacookiejar.mozillacookiejar[<cookie. aspxanonymous=vlho9jxszwekaaaaodc4n
TCWYTQTYWFINY00MZG5LTHKNZETNJYXYJCXNMM3NZDK-H-RHMNHR0FVC7UTHULZ8QUFBOU1 for Zhuz
Hou2013.feixuelixm.teacher.com.cn/&gt, <cookie Asp.net_sessionid=reg0ik55hffagzjab
Wo3xu45 for Zhuzhou2013.feixuelixm.teacher.com.cn/>, <cookie Feixuelixmweb=r-fei
xuelixm_7.86 for Zhuzhou2013.feixuelixm.teacher.com.cn/>]>

The value of the "Cookie" in the POST request head is in the following format:
Cookies:. Aspxanonymous=vlho9jxszwekaaaaodc4ntcwytqtywfiny00mzg5lthknzetnjyxyjcx
NMM3NZDK-H-RHMNHR0FVC7UTHULZ8QUFBOU1; asp.net_sessionid=reg0ik55hffagzjabwo3xu45;
feixuelixmweb=r-feixuelixm_7.86

So use the following code to process:
#打印未处理的过的cookies

Print Cookiejar

cookies = '

#打印每个cookies的项目和值, finishing format

For index, cookies in Enumerate (Cookiejar):

print ' [', Index, '] ';

Print cookie.name;

Print Cookie.value;

Print "###########################"

cookies = cookies+cookie.name+ "=" +cookie.value+ ";";

Print "###########################"

Cookie = cookies[:-1]

#打印整理的好的cookie

Print "Cookies:", cookies
So the head request is:
headers = {

' Accept ': ' text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 ',

' Accept-language ': ' zh-cn,en-us;q=0.8,zh;q=0.5,en;q=0.3 ',

' accept-encoding ': ' gzip, deflate ',

' Host ': ' zhuzhou2013.feixuelixm.teacher.com.cn ',

' Cookies ': cookies,

' User-agent ': ' mozilla/5.0 (Windows NT 5.1; rv:29.0) gecko/20100101 firefox/29.0 ',

' Referer ': ' http://zhuzhou2013.feixuelixm.teacher.com.cn/GuoPeiAdmin/Login/Login.aspx ',

# ' Content-type ': ' application/x-www-form-urlencoded ',

# ' Content-length ': 474,

' Connection ': ' Keep-alive '

}

Start to construct the POST request header, the data in the post requires a CAPTCHA, where the verification code uses the method in the first article to get text.
Then, the value of some things in the post data will not change, for example: __eventtarget, _eventargument,__viewstate. These must be observed and contrasted by themselves before they know.
#用户名, password
Username = "Xxxxxxxxxxxxx"
Password = "Yyyyfasdfas"
PostData = {
' __eventtarget ': ',
' __eventargument ': ',
' __viewstate ': '/ wepdwukltcymzeymty2nw8wah4ltg9naw5lzfbhz2ufeexvz2luzwrqywdllmfzchgwamypzbyczg8pzbyghgv0axrszqug55so5oi35zcnl+ wtpus5ooeggs/ Ouqvku73or4hlj7ceb29uzm9jdxmfegnozwnrsw5wdxqodghpcykebm9uymx1cguncmvzdg9yzsh0aglzkwqyaquex19db250cm9sc1jlcxvpcmvqb3n0qmfj A0tlev9ffgefc0ltz2j0bkxvz2luckjjpnhruswhtput33uj1dbukvw= ',
' txtUserName ': username,
' Txtpassword ':p assword,
' Txtcode ': text,
' Imgbtnlogin.x ': 44,
' IMGBTNLOGIN.Y ': 14,
' Clientscreenwidth ': 1180
}
3. Send POST request
Then, the constructed Post's data and head are consolidated into a POST request, and the request is sent:
#合成post数据
data = Urllib.urlencode (postdata)
Print "data:###############"
Print data
#创建request
#构造request请求
Request = Urllib2. Request (Posturl,data,headers)
Try
#访问页面
Response = Urllib2.urlopen (Request)
#cur_url = Response.geturl ()
#print "Cur_url:", Cur_url
Status = Response.getcode ()
Print status
Except Urllib2. Httperror, E:
Print E.code
#将响应的网页打印到文件中, make it easy for you to troubleshoot errors
#必须对网页进行解码处理
F= Response.read (). Decode ("UTF8")
OutFile =open ("Rel_ip.txt", "W")
Print >> outfile, "%s"% (f)
#打印响应的信息
info = Response.info ()
Print Info
4. Verify that the login is successful
As you just said, because the return page (http://zhuzhou2013.feixuelixm.teacher.com.cn/IndexPage/Index.aspx) has an IFRAME, so we can't crawl to the string we want ( The return page HTML code in the Rel_ip.txt file, we do not see the contents of the IFRAME
We can go directly to a page that has no IFRAME, and contains a string that can verify our login success, that is, the IFRAME source site:
#测试登陆是否成功, because in Testurl only after landing can access
#还有一个原因是因为post返回得到的网页中含有iframe, and to search for information just in the IFRAME, so go to the original address of the IFRAME verification
Testurl = "Http://zhuzhou2013.feixuelixm.teacher.com.cn/GuoPeiAdmin/Login/LoginedPage.aspx"
Try
Response = Urllib2.urlopen (Testurl)
Except Urllib2. Httperror, E:
Print E.code
#因为后面要从网页查找字符来验证登陆成功与否, so make sure that the characters you look for are the same as the page code, otherwise you get the right conclusions. It is recommended to find in English, such as the ID in CSS, name and so on.
F= Response.read (). Decode ("UTF8"). Encode ("UTF8")
OutFile =open ("Out_ip.txt", "W")
Print >> outfile, "%s"% (f)
#在返回的网页中, look for "Hello" two characters, because only after the successful landing only two words, found that means landing success. Suggested in English
tag = ' Hello '. Encode ("UTF8")
If Re.search (tag,f):
#登陆成功
print ' Logged in successfully! '
Else
#登陆失败
print ' Logged in failed, check result.html file for details '
Response.close ()
#这个代码很随意, but easy to see, need to live, can be written as a function. There is Urlopen () in a large number of landing and inspection process, may read (0 because of network congestion and timeout (timeout), you need to set the Urlopen () timeout, or multiple send request

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.