Resources:
http://www.csuldw.com/2016/11/10/2016-11-10-simulate-sina-login/
http://blog.csdn.net/fly_leopard/article/details/51148904
Http://www.tuicool.com/articles/uIJzYff
http://blog.csdn.net/u010029983/article/details/46364113
such as
Simulation Sina Weibo login is the basis of crawling Sina data, most of the online reference is developed with Python, there is a use of PHP simulation login data is implemented in Phpcms, nor too deep analysis.
PS: The source of online data is quite chaotic, do not know phpcms to realize the simulation of the original micro-blog login is not csdn t0mcl0nes, this article introduces the core of PHP simulation login is this article. The following references to this article are referred to in the PHPCMS scheme.
Using PHP to sign in to Sina Weibo and Python is still a bit different, and there are some problems, here I will briefly analyze the PHP simulation of Sina Weibo login process and existing problems.
Project Address:
Https://github.com/daweilang/GetWeiBoCookie
The title "based on Laravel framework", because the entire system to obtain Sina Weibo data is built using the Lavarel framework, using Lavarel queues, commands and other tools. In fact, analog Sina Weibo login This section can be fully implemented with a simple PHP program page, I hope that the following analysis can help interested friends to implement their own simulation login program.
The Ps:app\http\controllers\admin\authorizecontroller controller is the main program in this article, and the code mentioned below is in the GitHub program.
Here is the simulation of Sina Weibo login, in particular, through the Sina Pass simulation login. Sina's passport is Sina's Unified login mode, Sina (sina.com.cn) and Weibo (weibo.com) is two different top-level domains, it is through the Sina Pass, micro-blog to achieve cross-domain login. There's not much to know about cross-domain logins, but the way Sina uses it is technically very deep.
Specific login parameters of the packet capture analysis please refer to the above and some articles on the web, where the fixed parameters can refer to My Code "config/weibo.php" Curl array.
Here the main pre-login and pre-login parameters returned in conjunction with the PHP program further explained.
When the user enters the user name and the focus leaves the input box, the login page http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback= Sinassocontroller.prelogincallback&su=%s&rsakt=mod&checkpin=1&client=ssologin.js (v1.4.18) & _=%s "sends a GET request where the user name is Base64_encode encrypted and the last is a millisecond-level timestamp, and the other parameters can be fixed.
The request returned some parameters, the online data has been introduced in the following mainly used in the "Servertime", "nonce" and "RSAKV" the three parameters, but in fact, "Showpin" and "pcid" These two parameters are also very important, Showpin is 1 when the representative needs to fill in the verification code, and the Verification code generation needs "PCID" this parameter.
In the phpcms scenario, these parameters are stored in a cookie, my code is stored in a file, and the acquisition and storage code of the parameters are described in the Getpreurl method of Getweibocookie.
Here is the core of using PHP emulation to sign in to Sina Weibo, and also the most different from the Python analog login, password encryption.
Online articles have introduced the micro-Bo password encryption principle, using the RSA2
algorithm, "first create a rsa
public key, the public key two parameters are fixed values, the first parameter is the login process prelogin.php
, the pubkey
second parameter is the encrypted js
file specified in the" 10001 " (These two values need to be converted from 16 to 10 and the "10001" to "65537" in decimal). Finally, add servertime
and nonce
further encrypt. ”
The process of the Sina Pass is, after the user fills in the user name and the password submits, requests "https://login.sina.com.cn/js/sso/ssologin.js" The page, this JS page is above the encryption algorithm, uses the JS to encrypt the password.
The code for this process using Python simulation is this:
Rsakey = RSA. PublicKey (Rsapubkey, 65537) #创建公钥 #根据js拼接方式构造明文 codestr = str (servertime) + ' \ t ' + str (nonce) + ' \ n ' + str (PASSW ORD) pwd = Rsa.encrypt (codestr, Rsakey) #使用rsa进行加密
Just three lines of code, only need to install RSA package ...
Baidu for a long time, did not find PHP implementation to generate RSA public key method.
PHPCMS implementation of a solution, that is, according to the Sina pass process to achieve, with the Ssologin JS method to encrypt. The PHPCMS scheme encapsulates the SSO encryption algorithm once, and csdn the JS code on the blog. My JS level is not high, completely borrowed from this code, on this basis, the encryption algorithm extracted into a JS file.
<script type= ' text/javascript ' src= '/js/prelogin.js ' ></script> <script type= "Text/javascript" > Fu Nction Getpass (pwd,servicetime,nonce,rsapubkey) {var rsakey=new sinassoencoder.rsakey (); Rsakey.setpublic (Rsapubkey, ' 10001 '); var password=rsakey.encrypt ([Servicetime,nonce].join (' \\t ') + ' \\n ' +pwd); return password; } </script>
This approach has a big drawback, requiring a separate page to generate the encrypted password, and pass the password to the final submission page, which means that the page needs to jump multiple times.
var encrpt = Getpass (' {$preParam [' SP ']} ', ' {$preParam [' servertime '} ', ' {$preParam [' nonce ']} ', ' {$preParam [' PubKey '] } ');//document.write (ENCRPT); window.location.href= '/admin/authorize/browserlogin/?sp= ' +encrpt;
Finally, pass the encrypted password to the final submission page, see the Getrsapwd () method.
I was in the final submission page Browserlogin in the file before the various parameters extracted and encrypted password combination, and finally post to the Sina Pass login page login.
Here is another point is introduced before, need to fill in the verification code situation, if the pre-login returned the Showpin parameter is 1, need to obtain a verification code picture, fill in the Verification code login. Verification Code picture address is
http://login.sina.com.cn/cgi/pin.php?r={$randInt}&s=0&p={$preParam [' Pcid ']}
R is a random 8-digit number, p is a pre-login return pcid. If there is a verification code is more than one time to manually fill out the verification code process.
The specific code refers to the Browserlogin () method.
this method and Python compared to the biggest disadvantage is not automatically login , that is, background login, Python does not require human triggering, there is a user name and password can fully use the program to impersonate the login, and the PHP implementation needs to manually trigger the login.
In addition, if you need to fill in the verification code, Python also has tools to identify the verification code, to do automatic code, the whole process of automatic login, this PHP implementation is more difficult.
After the curl login there is no need to specifically explain, should be set in the parameters of the portal Weibo, so the returned cookie can be used directly to the microblog, in theory this way can be in the Sina full site simulation login, but I did not try other sub-stations. Sina is only strict in the password verification, there are not many restrictions on the analog login.
In summary, although the use of PHP to achieve the simulation of Sina Weibo login, but Python is still very inconvenient, after all, Python has a lot of crawler tools. However, in the process of acquiring Weibo data on the basis of the simulation login micro-blog, the framework of using Lavarel has implemented many scripting functions, which greatly improves the efficiency of fetching data, which is why I use Lavarel to develop this small project.
Https://github.com/daweilang/GetWeiBoCookie
The ability to emulate Sina Weibo login has been implemented and is no longer updated.
Follow Sina Weibo data capture analysis, please pay attention to Https://github.com/daweilang/GetWB
This project is still in the adjustment stage, there are many shortcomings need to be perfected, after the function mature, I will also around the project design objectives, introduce the implementation of the plan.
This article is from the "Remember Something" blog, please be sure to keep this source http://daweilang.blog.51cto.com/9806748/1911520
PHP-based Laravel framework for one of the microblogging data simulation Sina Weibo login