As we all know, the HTTP connection is stateless, so the question is, how to log the user's login information? As a general practice, when a user first sends an HTTP request, a Sessionid,sessionid is generated on the HTTP server side that corresponds to the state of each session (such as whether to log in or not), and the SessionID is saved in the browser's cookies. After we log in to a webpage, open another window to access the same page without logging in, because two pages correspond to the same cookie.
Sometimes when you are doing a Python crawler, you need to access a Web page that you have logged in to access, and you can do this with a cookie file that has already been logged in. The following take the Thunderbolt network as an example to do the experiment, the experimental platform for Linux.
1. First log in to the Thunderbolt network in Firefox browser and use the Firebug plugin to export cookies.
2. Modify the format of the cookie, assuming that the file name is xunlei.txt, the correct format is as follows:
1 # Netscape HTTP Cookie File.2# Generated by Wget on -- .- - at: Wu: the.3 # Edit at your own risk.4 5. dynamic.i.xunlei.com True/false1498494348__utma74633479.1276576155.1435422349.1435422349.1435422349.16. i.xunlei.com True/false1498494325__utma112570076.1792933177.1435422325.1435422325.1435422325.17. dynamic.i.xunlei.com True/false1435424148__utmb74633479.1.10.14354223498. i.xunlei.com True/false1435424125__utmb112570076.1.10.14354223259. dynamic.i.xunlei.com True/false1498494348__utmc74633479Ten. i.xunlei.com True/false1498494348__utmc112570076 One. i.xunlei.com True/false1435422925__utmt1 A. dynamic.i.xunlei.com True/false1451190348__utmz74633479.1435422349.1.1. utmcsr=i.xunlei.com|utmccn= (Referral) |utmcmd=referral|utmcct=/Login. html -. i.xunlei.com True/false1451190325__utmz112570076.1435422325.1.1. utmcsr= (direct) |utmccn= (direct) |utmcmd=(None) -Dynamic.i.xunlei.com False/false1498494348__xltjbr1435422347556 theDynamic.i.xunlei.com False/false1435424148_S19 1435770994546b1435422324953b2bhttp%3a//Dynamic.i.xunlei.com/user
# The number of cookies is more, it is not written, there are three places to note:
# 1. The first line must not be less, and one character cannot be wrong.
# 2. The format should be strictly (blank tab):
Domain [True or FALSE]/[true or FALSE] expiration timestamp name content
3. Use Python code to read Xunlei.txt, and access Web pages that can be accessed after login, for example: Http://dynamic.i.xunlei.com/user
The following is the source code:
1 ImportCookielib, Urllib22 3Cookie =Cookielib. Mozillacookiejar ()4Cookie.load ("Xunlei.txt")5handle=Urllib2. Httpcookieprocessor (Cookie)6Opener =Urllib2.build_opener (handle)7 Urllib2.install_opener (opener)8 9URL ="Http://dynamic.i.xunlei.com/user"Tenreq =Urllib2. Request (URL) OneResponse =Urllib2.urlopen (req) A PrintResponse.read ()
4. The printed code is what I see in http://dynamic.i.xunlei.com/user after I log in.
The above principle is similar to the CSRF attack principle, CSRF attack is the use of illegal access to user cookies, disguised as a user to operate. For this type of attack, the Web site can generate token,http server to validate tokens for each request to avoid CSRF attacks, such as Django Csrfviewmiddleware.
But tokens are still placed in cookies and can still be csrf attacks, but the attack is more complicated.
Use cookies to access the website after login