Beginner web crawler, reference
Http://cuiqingcai.com/968.html and http://blog.csdn.net/pleasecallmewhy/article/details/8923067
Log in to my own school will encounter the code of the obstruction, a relatively simple way is to bypass the verification code, first manually logged in, and then use the cookie to stay signed in.
However, this login cookie how to get it, in Firefox cookies are packaged in a file, in IE I can not find the Dean's website cookie ...
The file cannot be found, but the value of this cookie can be found, either through fiddler or Httpfox, and then I tried as http://tieba.baidu.com/p/3272054397 mentioned
For CK in CJ:
Ck.name= ' new name '
Ck.value= ' new value '
Cj.set_cookie (CK) also cannot be added correctly, print a cookie, the result is still blank. The official documentation is also very simple, so use the http://cuiqingcai.com/968.html mentioned in theCookie = cookielib. Mozillacookiejar(filename)-------> Here's Mozillacookiejar, not the same as the Cookiejar.
Create a cookie and then use the associated save, load method to save and load the cookie
Because you want to use the cookie to manually fill out the verification code, so the save is to provide a template, (here output response.read () should see "Login expired Please login again" word)
1 ImportCookielib2 ImportUrllib23 4 #set the file that holds the cookie, cookie.txt in the sibling directory5filename ='Cookie.txt'6 #declares a Mozillacookiejar object instance to hold the cookie, and then writes the file7Cookie =Cookielib. Mozillacookiejar (filename)8 #Use the Httpcookieprocessor object of the URLLIB2 library to create a cookie processor9Handler =Urllib2. Httpcookieprocessor (Cookie)Ten #build opener with handler OneOpener =Urllib2.build_opener (handler) A #create a request that works with Urllib2 's Urlopen -Response = Opener.open ("post-Login redirected URLs") - #Save cookies to file theCookie.save (Ignore_discard=true, Ignore_expires=true)
Open this saved cookie and my long like this
# Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This is a generated file! Do not edit. 202.118.31.197 false / false Jsessionid xfca9xukvdiuyrz1xkspvsk1kl0ghjckhqkw6rs3mmyywqv3roja!1269920556
Of course this value is wrong, if you can log in ...
The right cookie, if viewed with the Flidder tool, is
Make an analogy, cookie:jsessionid=xfca9xukvdiuyrz1xkasdasd0ghjckhqkw6rs3mmyywqv3roja!1269920556.
So we also know to change those parts ...
Fill in the correct value and then
1 ImportCookielib2 ImportUrllib23 4 #Creating an Mozillacookiejar instance Object5Cookie =Cookielib. Mozillacookiejar ()6 #read cookie content from file to variable7Cookie.load ('Cookie.txt', Ignore_discard=true, ignore_expires=True)8 #to create the requested request9req = Urllib2. Request ("http://www.baidu.com")Ten #use Urllib2 's Build_opener method to create a opener OneOpener =Urllib2.build_opener (urllib2. Httpcookieprocessor (cookie)) AResponse =Opener.open (req) - PrintResponse.read ()
O (︶^︶) O finally login succeeded,,, scatter, next is to use regular expressions to process the HTML page
Use Python's cookielib to load a saved cookie to maintain login status