First, the demand
Recently learning Python, there is just one need to get a local password from a local passwords management system. The previous process is to log on to the Web-----Enter the computer name-----Administrator account, Password---submit------page return password, copy password, send mail to the requesting user. Every time to login to the page, this is very depressed, so record the whole process also helps to learn
Second, page analysis
Let's take a look at the whole process: Enter the options and the query will return
650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/82/05/wKioL1dH48OhKPeJAACyK26AZtc888.jpg-wh_500x0-wm_3 -wmp_4-s_3752582222.jpg "title=" 1.jpg "alt=" Wkiol1dh48ohkpejaacyk26aztc888.jpg-wh_50 "/>
Let's first look at what the page is, there are two hidden parameters, each time the submission is changed every time, this is a pit, the following will mention
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M02/82/06/wKiom1dH5YHQfem1AAGyWHHlpYA431.jpg-wh_500x0-wm_3 -wmp_4-s_3416797829.jpg "title=" 4.jpg "alt=" Wkiom1dh5yhqfem1aagywhhlpya431.jpg-wh_50 "/>
Anyway, I'm using chrome, direct F12, see network Options, click Query. The page sees the return password,
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/82/06/wKiom1dH4-mCwcVeAADRpeq22DQ122.jpg-wh_500x0-wm_3 -wmp_4-s_1292154927.jpg "style=" Float:none; "title=" 3.jpg "alt=" Wkiom1dh4-mcwcveaadrpeq22dq122.jpg-wh_50 "/>
What is the back end? Post the stuff.
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/82/06/wKiom1dH513hqI1YAAIMmdCRQ9U198.jpg-wh_500x0-wm_3 -wmp_4-s_606899498.jpg "title=" 7.jpg "alt=" Wkiom1dh513hqi1yaaimmdcrq9u198.jpg-wh_50 "/>
Third, the realization of ideas
Know the general process, now to do is to use Python simulation browser action, simulate login, construct post parameters, return the results of the computer name, password, validity of this three parameters.
Iv. description of the pit, settlement
__viewstate and __eventvalidation is what, actually I also understand, Baidu down.
__viewstate
ViewState is a mechanism used in ASP. NET to hold the state value of the Web control callback. When the Web Form (form) is set to runat= "Server", the form (form) is appended with a hidden property _viewstate. The _viewstate stores the state values of all controls in viewstate.
ViewState is a field in a class control, and all other controls gain viewstate functionality by inheriting control. It is of type system. A collection of Web.UI.StateBag, a name/value object.
When a page is requested, ASP. NET serializes the state of all controls into a single string, which is then sent to the client as a hidden property of the form. When the client callbacks the page, ASP. NET parses the returned form properties and assigns the corresponding values to the control
__eventvalidation
__eventvalidation is just used to verify that the event is sent from a legitimate page, just a digital signature, so it is generally short.
The hidden field with the "id" attribute "__eventvalidation" is the new security measure for ASP. NET 2.0. This feature can block unauthorized requests that are sent from the browser side by potentially malicious users.
To ensure that each postback and callback event comes from the desired user interface element, ASP. The NET runtime adds an additional layer of validation to the event. The server side submits the requested content by examining the form, matching it to the information in the "__eventvalidation" hidden field of the "id" attribute. Verify that no additional input fields are added to the browser side based on the matching results (there may be a maliciously-added field for the user on the browser side), and that the value is selected in the list known to the server. Asp. The NET runtime creates an event validation field during the build, which is the most unlikely time to get that information. Like view state, event validation fields contain hash values to prevent browser-side tampering.
Description: The "id" property for "__eventvalidation" hides the field generally at the bottom of the form, and if the form is not resolved at the end of the browser, the user submits the data that could cause validation to fail.
So the key point is to get these two values. Constantly tried, found that can be implemented, using the regular, before crawling to open a URL to get the value, and then at the time of the second crawl to pass this value into the post parameters
#!/usr/bin/python#coding:utf-8import urllib import urllib2import redef get_ Hiddenvalue (URL): request=urllib2. Request (URL) reponse=urllib2.urlopen (request) resu=reponse.read ( ) viewstate =re.findall (R ' <input type= "hidden" name= "__VIEWSTATE" id= "__viewstate" value= "(. *?)" /> ', resu,re. I) eventvalidation =re.findall (R ' input type= "hidden" name= "__ Eventvalidation " id=" __eventvalidation " value=" (. *?) " /> ', resu,re. I) return VIEWSTATE[0],EVENTVALIDATION[0] url= ' "http://servicedesk.* . com/lapq/"' #你自己的url test=get_hiddenvalue (URL) #可以自己设置打印测试返回的值
Five, complete code implementation
Later, in the Web implementation, enter NUM, directly return the resulting message to the user,
#!/usr/bin/python#coding:utf-8import urllib import urllib2import cookielibimport reimport sysimport configparser# climb get Urlurl= "http://servicedesk.sf-express.com/lapq/" def Getfromconfig (AGRs): #从配置文件得到相关基本信息, configuration information, is no longer written in the script. Reduce code maintenance. tagrs = agrs try: configfile = open (configetc, "R") #打开信息配置文件 except Ioerror: sys.exit () config = Configparser.configparser () #创建ConfigParser实例 &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;CONFIG.READFP (configfile) configfile.close () try: tagrs = config.get ("Baseinfo", Tagrs) #返回BASEINFO节中, Tagrs key value except configparser.nooptionerror: sys.exit () return tagrs# gets dynamically hidden parameters, prepares for post Def get_hiddenvalue (URL): request=urllib2. Request (URL) reponse=urllib2.urlopen (request) resu=reponse.read ( ) viewstate =re.findall (R ' <input type= "hidden" name= "__VIEWSTATE" id= "__viewstate" value= "(. *?)" /> ', resu,re. I) eventvalidation =re.findall (R ' input type= "hidden" name= "__ Eventvalidation " id=" __eventvalidation " value=" (. *?) " /> ', resu,re. I) return viewstate[0],eventvalidation[0] #爬取页面, get the password. DEF&NBSP;GET_LOCALPW (Url,num,username,password): #cookie, get back and post it out together cookie = cookielib. Cookiejar () opener = urllib2.build_opener (Urllib2. Httpcookieprocessor (Cookie)) viewstate, eventvalidation=get_hiddenvalue (URL) #需要POST的数据, postdata=urllib.urlencode ({ ' __viewstate ':viewstate, ' __ Eventvalidation ':eventvalidation, ' TextBox_Computer ': ' hq-pay-%s ' %num, ' textbox_laaccount ': '%s ' %username, ' Textbox_password ': '%s ' %password, ' button_submit ': ' Check enquire ' }) #自定义一个请求 # req = urllib2. Request ( url = ' http://servicedesk.sf-express.com/lapq/', data = postdata &nBSP;) #访问该链接 # result = opener.open (req) #打印返回的内容 # a=result.read () #正则匹配, extract the required data, I'm just printing here , followed back as the return value, written directly into the text. computer=re.findall (R ' <input name= "Textbox_computer" type= "text" Value= "(. *?)" ', a,re. I) password= re.findall (R ' <input name= "Textbox_laaccount" type= "text" value= "(. *?)" ', a,re. I) expired_time=re.findall (R ' <input name= "Textbox_password" type= " DateTime " value=" (. *?) "', a,re. I) print u ' computer name: ' ,computer[0] print u ' secret : ', password[0] print u ' expiry time: ', expired_time[0]if __name__== ' __main__ ': configetc = "psw.conf" useRname=getfromconfig (' username ') password=getfromconfig (' Password ') num= ' ****** ' get_localpw (url,num, Username,password)
Vi. results,
The results of the crawl are consistent with the login page. Requests for bulk applications can be quickly removed with a for loop.
650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M00/82/07/wKiom1dH7y3A8c1uAABmVAu8yXo018.jpg-wh_500x0-wm_3 -wmp_4-s_3658514173.jpg "style=" Float:none; "title=" 10.jpg "alt=" Wkiom1dh7y3a8c1uaabmvau8yxo018.jpg-wh_50 "/>
650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M01/82/06/wKioL1dH8CTjnVSsAABVPpa7dqU308.jpg-wh_500x0-wm_3 -wmp_4-s_2165410242.jpg "style=" Float:none; "title=" 11.jpg "alt=" Wkiol1dh8ctjnvssaabvppa7dqu308.jpg-wh_50 "/>
e What's the use?
Vii. Summary
With the hidden authentication mechanism, you can use the URLLIB2 open once to get its value, and then the second crawl when the value of the post is passed in, so that the page form form login. Don't bother with the two __eventvalidation, __viewstate fields in the second open.
Pyton Simple web crawler, the ASPX site form uses __viewstate, __eventvalidation, cookies to validate the submission