Pyton Simple web crawler, the ASPX site form uses __viewstate, __eventvalidation, cookies to validate the submission

Source: Internet
Author: User

First, the demand

Recently learning Python, there is just one need to get a local password from a local passwords management system. The previous process is to log on to the Web-----Enter the computer name-----Administrator account, Password---submit------page return password, copy password, send mail to the requesting user. Every time to login to the page, this is very depressed, so record the whole process also helps to learn


Second, page analysis

Let's take a look at the whole process: Enter the options and the query will return

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/82/05/wKioL1dH48OhKPeJAACyK26AZtc888.jpg-wh_500x0-wm_3 -wmp_4-s_3752582222.jpg "title=" 1.jpg "alt=" Wkiol1dh48ohkpejaacyk26aztc888.jpg-wh_50 "/>

Let's first look at what the page is, there are two hidden parameters, each time the submission is changed every time, this is a pit, the following will mention

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M02/82/06/wKiom1dH5YHQfem1AAGyWHHlpYA431.jpg-wh_500x0-wm_3 -wmp_4-s_3416797829.jpg "title=" 4.jpg "alt=" Wkiom1dh5yhqfem1aagywhhlpya431.jpg-wh_50 "/>

Anyway, I'm using chrome, direct F12, see network Options, click Query. The page sees the return password,

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/82/06/wKiom1dH4-mCwcVeAADRpeq22DQ122.jpg-wh_500x0-wm_3 -wmp_4-s_1292154927.jpg "style=" Float:none; "title=" 3.jpg "alt=" Wkiom1dh4-mcwcveaadrpeq22dq122.jpg-wh_50 "/>

What is the back end? Post the stuff.

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/82/06/wKiom1dH513hqI1YAAIMmdCRQ9U198.jpg-wh_500x0-wm_3 -wmp_4-s_606899498.jpg "title=" 7.jpg "alt=" Wkiom1dh513hqi1yaaimmdcrq9u198.jpg-wh_50 "/>


Third, the realization of ideas

Know the general process, now to do is to use Python simulation browser action, simulate login, construct post parameters, return the results of the computer name, password, validity of this three parameters.


Iv. description of the pit, settlement

__viewstate and __eventvalidation is what, actually I also understand, Baidu down.


__viewstate
ViewState is a mechanism used in ASP. NET to hold the state value of the Web control callback. When the Web Form (form) is set to runat= "Server", the form (form) is appended with a hidden property _viewstate. The _viewstate stores the state values of all controls in viewstate.
ViewState is a field in a class control, and all other controls gain viewstate functionality by inheriting control. It is of type system. A collection of Web.UI.StateBag, a name/value object.
When a page is requested, ASP. NET serializes the state of all controls into a single string, which is then sent to the client as a hidden property of the form. When the client callbacks the page, ASP. NET parses the returned form properties and assigns the corresponding values to the control

__eventvalidation
__eventvalidation is just used to verify that the event is sent from a legitimate page, just a digital signature, so it is generally short.
The hidden field with the "id" attribute "__eventvalidation" is the new security measure for ASP. NET 2.0. This feature can block unauthorized requests that are sent from the browser side by potentially malicious users.
To ensure that each postback and callback event comes from the desired user interface element, ASP. The NET runtime adds an additional layer of validation to the event. The server side submits the requested content by examining the form, matching it to the information in the "__eventvalidation" hidden field of the "id" attribute. Verify that no additional input fields are added to the browser side based on the matching results (there may be a maliciously-added field for the user on the browser side), and that the value is selected in the list known to the server. Asp. The NET runtime creates an event validation field during the build, which is the most unlikely time to get that information. Like view state, event validation fields contain hash values to prevent browser-side tampering.
Description: The "id" property for "__eventvalidation" hides the field generally at the bottom of the form, and if the form is not resolved at the end of the browser, the user submits the data that could cause validation to fail.

So the key point is to get these two values. Constantly tried, found that can be implemented, using the regular, before crawling to open a URL to get the value, and then at the time of the second crawl to pass this value into the post parameters

#!/usr/bin/python#coding:utf-8import urllib  import urllib2import redef get_ Hiddenvalue (URL):     request=urllib2. Request (URL)     reponse=urllib2.urlopen (request)     resu=reponse.read ( )     viewstate =re.findall (R ' <input type= "hidden"  name= "__VIEWSTATE"  id= "__viewstate"  value= "(. *?)"  /> ',  resu,re. I)     eventvalidation =re.findall (R ' input type= "hidden"  name= "__ Eventvalidation " id=" __eventvalidation " value=" (. *?) "  /> ',  resu,re. I)     return VIEWSTATE[0],EVENTVALIDATION[0]                     url= ' "http://servicedesk.* . com/lapq/"'   #你自己的url  test=get_hiddenvalue (URL)                      #可以自己设置打印测试返回的值 

Five, complete code implementation

Later, in the Web implementation, enter NUM, directly return the resulting message to the user,

#!/usr/bin/python#coding:utf-8import urllib  import urllib2import cookielibimport  reimport sysimport configparser# climb get Urlurl= "http://servicedesk.sf-express.com/lapq/" def  Getfromconfig (AGRs):     #从配置文件得到相关基本信息, configuration information, is no longer written in the script. Reduce code maintenance.     tagrs = agrs    try:         configfile = open (configetc, "R")   #打开信息配置文件     except  Ioerror:        sys.exit ()     config =  Configparser.configparser ()   #创建ConfigParser实例 &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;CONFIG.READFP (configfile)     configfile.close ()     try:         tagrs = config.get ("Baseinfo", Tagrs) #返回BASEINFO节中, Tagrs key value     except  configparser.nooptionerror:         sys.exit ()             return   tagrs# gets dynamically hidden parameters, prepares for post Def get_hiddenvalue (URL):     request=urllib2. Request (URL)     reponse=urllib2.urlopen (request)     resu=reponse.read ( )     viewstate =re.findall (R ' <input type= "hidden"  name= "__VIEWSTATE"  id= "__viewstate"  value= "(. *?)"  /> ',  resu,re. I)     eventvalidation =re.findall (R ' input type= "hidden"  name= "__ Eventvalidation " id=" __eventvalidation " value=" (. *?) "  /> ',  resu,re. I)     return viewstate[0],eventvalidation[0] #爬取页面, get the password. DEF&NBSP;GET_LOCALPW (Url,num,username,password):     #cookie, get back and post it out together      cookie = cookielib. Cookiejar ()       opener = urllib2.build_opener (Urllib2. Httpcookieprocessor (Cookie))     viewstate, eventvalidation=get_hiddenvalue (URL)       #需要POST的数据,     postdata=urllib.urlencode ({            ' __viewstate ':viewstate,         ' __ Eventvalidation ':eventvalidation,         ' TextBox_Computer ': ' hq-pay-%s '  %num,         ' textbox_laaccount ': '%s '  %username,          ' Textbox_password ': '%s '  %password,          ' button_submit ': ' Check   enquire '     })      #自定义一个请求 #     req = urllib2. Request (      url =  ' http://servicedesk.sf-express.com/lapq/',       data = postdata    &nBSP;)      #访问该链接 #    result = opener.open (req)       #打印返回的内容 #    a=result.read ()       #正则匹配, extract the required data, I'm just printing here , followed back as the return value, written directly into the text.     computer=re.findall (R ' <input name= "Textbox_computer"  type= "text"   Value= "(. *?)" ',  a,re. I)     password= re.findall (R ' <input name= "Textbox_laaccount"  type= "text"  value= "(. *?)" ',  a,re. I)     expired_time=re.findall (R '  <input name= "Textbox_password"  type= " DateTime " value=" (. *?) "',  a,re. I)     print u ' computer name: '  ,computer[0]    print u ' secret           : ', password[0]    print u ' expiry time: ', expired_time[0]if __name__== ' __main__ ':    configetc =  "psw.conf"      useRname=getfromconfig (' username ')     password=getfromconfig (' Password ')              num= ' ****** '     get_localpw (url,num, Username,password)


Vi. results,

The results of the crawl are consistent with the login page. Requests for bulk applications can be quickly removed with a for loop.

650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M00/82/07/wKiom1dH7y3A8c1uAABmVAu8yXo018.jpg-wh_500x0-wm_3 -wmp_4-s_3658514173.jpg "style=" Float:none; "title=" 10.jpg "alt=" Wkiom1dh7y3a8c1uaabmvau8yxo018.jpg-wh_50 "/>

650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M01/82/06/wKioL1dH8CTjnVSsAABVPpa7dqU308.jpg-wh_500x0-wm_3 -wmp_4-s_2165410242.jpg "style=" Float:none; "title=" 11.jpg "alt=" Wkiol1dh8ctjnvssaabvppa7dqu308.jpg-wh_50 "/>

e What's the use?

Vii. Summary

With the hidden authentication mechanism, you can use the URLLIB2 open once to get its value, and then the second crawl when the value of the post is passed in, so that the page form form login. Don't bother with the two __eventvalidation, __viewstate fields in the second open.







Pyton Simple web crawler, the ASPX site form uses __viewstate, __eventvalidation, cookies to validate the submission

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.