Python simulation landing Exercise--imooc.com Login

Source: Internet
Author: User
Tags urlencode

Writing this article is the third day for bloggers to learn Python (perhaps the fourth day: Python is the second interpreted language (the first is JavaScript) in contact with bloggers.

It was really a long time ago that I wanted to blog about My learning process, but it was like writing a diary, and writing it and giving it up. -

So today decided to give yourself a good start ~

Bo Master's Way of learning is to go straight to the goal, encounter problems Baidu all kinds of blog, website, Baidu can not find Google, and so on. This kind of learning is really quick, but obviously, the foundation will be weak.

So learn the basic syntax of Python and go straight to the reptile!

-----------------------------------above is a big preface---------------------------------------------

Today Bo mainly three days to learn the process of spider, experience to share out, hoping to give some novice to guide the road, but also to their Python career carved a trace.

Bo Master Foolish thought, the simulation landing site is nothing but two ways:

    1. One is to manually charge cookies.

      After the browser logs on to the website and completes the login, then open the developer tools, casually access a page, according to the actual situation to find a request, the cookie copy down.

    2. Second, Python collects cookies.

      This is also explained in detail in this article. See below.

Now let the blogger take imooc.com as an example to explain the website simulation landing

Begin

Bloggers are accustomed to writing crawlers in a urllib2+cookielib way, so the code begins with this:

#Coding=utf8Importsysreload (SYS) sys.setdefaultencoding ('UTF8')ImportUrllib2ImportUrllibImportCookielib#These are the routines .#Create a Cookiejar management cookie below, create a opener and install it in URLLIB2CJ =Cookielib. Cookiejar () opener=Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ) Urllib2.install_opener (opener) Opener.addheaders=[('user-agent','mozilla/5.0')]

Opener.addheaders can be used as a list to add headers, very convenient

And then

A cookie is a small file in which a server records user information, although it can sometimes infringe on everyone's privacy, but it is convenient to store the user's login information for automatic landing.

Its workflow is this:

    1. First the server will return a response to the browser on the first visit, there will be a few set-cookie information, and the browser silently helps you to record it into the cookie
    2. When you click on the login, enter the user name, password and other necessary information, the browser will be your information together with some of the above cookies to post to the server
    3. The browser receives a secret message from the server when the login is successful--get a few important cookies and save them
    4. If you do not close your browser at this time, when you visit other pages of the site, the browser will send some cookies to the server, when you find that you have automatically logged in
    5. If you choose "Automatic Login" "7 days Automatic Landing" such as checkbox, the browser will also get some long-term cookies (10 days half a month) so that you can land tomorrow, the day after the landing ....

Once we understand how cookies work, let's visit our homepage and get cookies down.

This is what the blogger wrote:
#write down a few URLs firstUrl_login ='Http://www.imooc.com/passport/user/login'Url_index='http://www.imooc.com'url_test='Http://www.imooc.com/user/setbindsns'Data= {    'username':'*********',    'Password':'*******',    'Verify':"',    'Remember':'1',    'Pwencode':'0',    'Referer':'http://www.imooc.com'}data_encoded=urllib.urlencode (data)#Get home page get cookieReq_index =Urllib2. Request (url_index) Res_index= Opener.open (Req_index)

We can print down a cookie to see:

Print cj._cookies

 {' www.imooc.com ': {'/': {' phpsessid ': Cookie (version=0, name= ' Phpsessid ', value= ' 3Q1C66HDS4H054F19CIQB4RTG2 ', Port=none, Port_specified=false, domain= ' www.imooc.com ', Domain_specified=false, Domain_initial_dot=false, path= '/' , Path_specified=true, Secure=false, Expires=none, Discard=true, Comment=none, Comment_url=none, rest={}, rfc2109= False)}}, '. imooc.com ': {'/': {' imooc_isnew_ct ': Cookie (version=0, name= ' imooc_isnew_ct ', value= ' 1486280759 ', port= None, Port_specified=false, domain= '. imooc.com ', Domain_specified=true, domain_initial_dot=true, path= '/', path_ Specified=true, Secure=false, expires=1517816759, Discard=false, Comment=none, Comment_url=none, rest={}, rfc2109= False), ' Cvde ': Cookie (version=0, name= ' cvde ', value= ' 5896d8376631d-1 ', Port=none, Port_specified=false, domain= '. Imooc.com ', Domain_specified=true, domain_initial_dot=true, path= '/', Path_specified=true, Secure=False, expires= None, Discard=true, Comment=none, Comment_url=none, rest={}, Rfc2109=false), ' Imooc_isnEW ': Cookies (version=0, name= ' imooc_isnew ', value= ' 1 ', Port=none, Port_specified=false, domain= '. imooc.com ', Domain_ Specified=true, Domain_initial_dot=true, path= '/', Path_specified=true, Secure=false, expires=1517816759, discard= False, Comment=none, Comment_url=none, rest={}, Rfc2109=false), ' Imooc_uuid ': Cookies (version=0, name= ' Imooc_uuid ', Value= ' D6a73549-4d53-47b6-90bc-28888d3438b8 ', Port=none, Port_specified=false, domain= '. imooc.com ', Domain_ Specified=true, Domain_initial_dot=true, path= '/', Path_specified=true, Secure=false, expires=1517816759, discard= False, Comment=none, Comment_url=none, rest={}, Rfc2109=false)}}}

What this is, I don't know. Let it go.

And then

Let's go to the landing with a cookie! Don't know which one to take? Take it all!

Req_login == Opener.open (req_login)

We tried to print the results on HTML:

IMOOC = open ('e:/imooc.html','w') imooc.write (Res_ Login.read ()) Imooc.close ()

When we open:

This TM does not seem to be an HTML, usually he will return an HTML, but this string of symbols stumped for 3 days to learn Python white.

Notice a message: "MSG": "\u6210\u529f" is obviously a Unicode format string, after a simple conversion, he means: "Success"

Bloggers are happy to steal. Now that it succeeds, then the valid information must exist in this string of symbols.

In this way, the right idea is to take these 2 url,uid and use the developer tools to continue searching for relevant information.

。。

But the blogger took a small detour.

Reverse analysis of Dafa

Bloggers decide to copy the cookies after landing and test the cookies required by each login.

Very simple, one by one of the deletion, see when the landing will be good ...

。。。。

After screening, Bo Master found 2 of the cookie:loginstate we need, apsid.

So the blogger decided to look for the apsid in nearly hundreds of cookies.

。。。

Got it!

And the URL to access is one of the 2 we get! Just with a few parameters.

Practice has shown that these 2 articles randomly pick a get to get the cookie we need.

Rob XI

We need 3 parameters: Token (URL already included), callback, _ (Underline-. -)

It is verified that the callback parameter is a fixed value.

OK, let's search for the underscore value.

Check the cookie and find it is the value of IMOOC_ISNEW_CT

It's almost done here.

All code:

#Coding=utf8#final versionImportsysreload (SYS) sys.setdefaultencoding ('UTF8')ImportUrllib2ImportUrllibImportCOOKIELIBCJ=Cookielib. Cookiejar () opener=Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ) Urllib2.install_opener (opener) Opener.addheaders=[('user-agent','mozilla/5.0')]url_login='Http://www.imooc.com/passport/user/login'Url_index='http://www.imooc.com'url_test='Http://www.imooc.com/user/setbindsns'Data= {    'username':'13153154784',    'Password':'Liuweidong',    'Verify':"',    'Remember':'1',    'Pwencode':'0',    'Referer':'http://www.imooc.com'}data_encoded=urllib.urlencode (data)#Get home page get cookieReq_index =Urllib2. Request (url_index) Res_index=Opener.open (Req_index)Printcj._cookiesPrint#Post landing PageReq_login =Urllib2. Request (url_login,data_encoded) Res_login=Opener.open (Req_login)Printres_login.read () res_dict=eval (Res_login.read ()) Url_ssologin= res_dict['Data']['URL'][0]PrintUrl_ssologinImportReurl_ssologin= Re.sub (r'\\/','/', Url_ssologin)PrintUrl_ssologinparams= {    'Callback':'jquery19106404770042720387_1486274878204',    '_': Str (cj._cookies['. imooc.com']['/']['imooc_isnew_ct']) [23:33]}url_ssologin= url_ssologin+'&'+Urllib.urlencode (params)#SSO landing PageReq_sso =Urllib2. Request (url_ssologin) Res_sso=Opener.open (Req_sso)#print res_sso.read ()#print cj._cookies['. imooc.com ' ['/'] [' loginstate ']req_test=Urllib2. Request (url_test) res_test=Opener.open (req_test) Imooc= Open ('c:/users/asus/desktop/imooc.html','W') Imooc.write (Res_test.read ()) Imooc.close ()

Bo Master actually encountered a lot of problems, took a lot of detours, thanks to a dalao of selfless help:)

The first time to write a blog, welcome technical exchange and Correction ~

Python Demo Login Practice--imooc.com Login

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.