"Python Network Programming" uses RSA cryptographic algorithm module to simulate login Sina Weibo

Source: Internet
Author: User
Tags base64

First, the basic knowledge

http://blog.csdn.net/pi9nc/article/details/9734437

Second, the analog login

Because last semester took part in a big data game, need to crawl data, so just want to write a crawler crawl Sina Weibo data.

Of course crawling data is not aimless, I need to follow the key words to crawl related Weibo.

Just like Weibo has an advanced search feature, but to get more tweets, you need to log in, so you'll need to simulate a login.

The following code is modeled by the RSA Cryptographic algorithm module. It should be noted that Sina has anti-crawler, so when we log in to disguise as a browser.

The code is not written by itself, so the article type is labeled reproduced, because the code is similar, so I do not write, some of the specific code and problem analysis, I do not repeat, because the simulation is not my focus, the next I will talk to you after the crawl and Web page parsing part. As for the login, the article at the beginning of the link has a detailed tutorial, interested can see.

[Python]View PlainCopy
  1. #! /usr/bin/env python
  2. #coding =utf8
  3. Import Urllib
  4. Import Urllib2
  5. Import Cookielib
  6. Import Base64
  7. Import re
  8. Import JSON
  9. Import Hashlib
  10. Import RSA
  11. Import Binascii
  12. CJ = Cookielib. Lwpcookiejar ()
  13. Cookie_support = Urllib2. Httpcookieprocessor (CJ)
  14. Opener = Urllib2.build_opener (Cookie_support, Urllib2. HttpHandler)
  15. Urllib2.install_opener (opener)
  16. PostData = {
  17. ' entry ': ' Weibo ',
  18. ' Gateway ': ' 1 ',
  19. ' from ': ' ,
  20. ' savestate ': ' 7 ',
  21. ' userticket ': ' 1 ',
  22. ' ssosimplelogin ': ' 1 ',
  23. ' vsnf ': ' 1 ',
  24. ' vsnval ': ' ,
  25. ' su ': ' ,
  26. ' service ': ' Miniblog ',
  27. ' servertime ': ' ,
  28. ' nonce ': ' ,
  29. ' pwencode ': ' rsa2 ', #加密算法
  30. ' SP ': ' ,
  31. ' encoding ': ' UTF-8 ',
  32. ' prelt ': ' 401 ',
  33. ' rsakv ': ' ,
  34. ' url ': ' http://weibo.com/ajaxlogin.php?framelogin=1&callback= Parent.sinaSSOController.feedBackUrlCallBack ',
  35. ' returntype ': ' META '
  36. }
  37. Class Weibologin:
  38. def __init__ (self, username, password):
  39. Self.username = Username
  40. Self.password = password
  41. def __get_spwd (self):
  42. Rsapublickey = Int (self.pubkey, + )
  43. Key = RSA. PublicKey (Rsapublickey, 65537) #创建公钥
  44. Message = self.servertime + ' \ t ' + self.nonce + ' \ n ' + self.password #拼接明文js加密文件中得到
  45. passwd = rsa.encrypt (message, key) #加密
  46. passwd = Binascii.b2a_hex (passwd) #将加密信息转换为16进制.
  47. return passwd
  48. def __get_suser (self):
  49. Username_ = Urllib.quote (self.username)
  50. Username = base64.encodestring (username_) [:-1]
  51. return username
  52. def __prelogin (self):
  53. Prelogin_url = ' http://login.sina.com.cn/sso/prelogin.php?entry=sso&callback=  Sinassocontroller.prelogincallback&su=%s&rsakt=mod&client=ssologin.js (v1.4.4) '% self.username
  54. Response = Urllib2.urlopen (Prelogin_url)
  55. p = re.compile (R'(.   ∗?)')
  56. strURL = P.search (Response.read ()). Group (1)
  57. DIC = Dict (eval (strurl)) #json格式的response
  58. Self.pubkey = str (dic.get (' PubKey '))
  59. self.servertime = str (dic.get (' servertime '))
  60. self.nonce = str (dic.get (' nonce '))
  61. self.rsakv = str (dic.get (' rsakv '))
  62. def login (self):
  63. url = ' http://login.sina.com.cn/sso/login.php?client=ssologin.js (v1.4.18) '
  64. Try:
  65. self.__prelogin () #预登录
  66. except:
  67. print ' prelogin Error '
  68. return
  69. Global PostData
  70. postdata[' servertime '] = self.servertime
  71. postdata[' nonce '] = self.nonce
  72. postdata[' su '] = self.__get_suser ()
  73. postdata[' sp '] = self.__get_spwd ()
  74. postdata[' rsakv '] = self.rsakv
  75. PostData = Urllib.urlencode (postdata)
  76. headers = {' user-agent ':' mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) gecko/20100101 firefox/37.0 '} #伪装成浏览器
  77. req = Urllib2. Request (
  78. url = URL,
  79. data = PostData,
  80. headers = Headers
  81. )
  82. result = Urllib2.urlopen (req)
  83. Text = Result.read ()
  84. p = re.compile (' location\.replace\ '(. ∗?)  \ ")
  85. Try:
  86. Login_url = P.search (text). Group (1)
  87. Urllib2.urlopen (Login_url)
  88. print "Login succeed!"
  89. except:
  90. print ' Login error! '

"Python Network Programming" uses RSA cryptographic algorithm module to simulate login Sina Weibo

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.