1. Main function (weibomain.py):
Copy the Code code as follows:
Import Urllib2
Import Cookielib
Import Weiboencode
Import Weibosearch
if __name__ = = ' __main__ ':
Weibologin = weibologin (' xxx @gmail. com ', ' xxxx ') #邮箱 (account), password
If weibologin.login () = = True:
Print "Successful landing!" "
The first two imports are the network programming modules that load Python, and the next import is to load another two files weiboencode.py and weiboseach.py (described later). The main function creates a new landing object and then logs in.
2. Weibologin Class (weibomain.py):
Copy the Code code as follows:
Class Weibologin:
def __init__ (self, user, pwd, enableproxy = False):
"Initialize Weibologin,enableproxy to indicate whether to use a proxy server, shutdown by default"
Print "Initializing weibologin ..."
Self.username = user
Self.password = pwd
Self.enableproxy = EnableProxy
Self.serverurl = "http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback= Sinassocontroller.prelogincallback&su=&rsakt=mod&client=ssologin.js (v1.4.11) &_=1379834957683 "
Self.loginurl = "Http://login.sina.com.cn/sso/login.php?client=ssologin.js (v1.4.11)"
Self.postheader = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; rv:24.0) gecko/20100101 firefox/24.0 '}
The initialization function, which defines two key URL members: The first step of Self.serverurl for landing (get servertime, nonce, etc.), the first step here contains 1 and 2 of the login process to parse Sina Weibo. Self.loginurl is used for the second step (after encrypting the user and password, post to the Url,self.postheader is the header of the post), this step corresponds to 3 of the login process that resolves Sina Weibo. There are also 3 in-class functions:
Copy the Code code as follows:
def Login (self):
"Login Procedure"
Self. Enablecookie (Self.enableproxy) #cookie或代理服务器配置
Servertime, Nonce, pubkey, rsakv = self. GetServerTime () #登陆的第一步
PostData = Weiboencode.postencode (Self.username, Self.password, Servertime, Nonce, PubKey, RSAKV) #加密用户和密码
Print "Post data length:\n", Len (postdata)
req = Urllib2. Request (Self.loginurl, PostData, Self.postheader)
print "Posting request ..."
result = Urllib2.urlopen (req) #登陆的第二步--Analysis of the login process of Sina Weibo 3
Text = Result.read ()
Try
loginurl = Weibosearch.sredirectdata (text) #解析重定位结果
Urllib2.urlopen (loginurl)
Except
print ' Login error! '
Return False
print ' Login sucess! '
Return True
Self. Enablecookie used to set up cookies and proxy servers, there are many free proxy servers on the network, to prevent the Sina IP, can be used. Then make the first step of landing, Access Sina server to get servertime and other information, and then use this information to encrypt the user name and password, build the post request; Take the second step, Send the user and password to Self.loginurl, after the relocation information, resolve to get the final URL to jump to, open the URL, the server automatically writes the user login information to the cookie, successful landing.
Copy the Code code as follows:
def enablecookie (self, enableproxy):
"Enable Cookies & Proxies (if needed)."
Cookiejar = Cookielib. Lwpcookiejar () #建立cookie
Cookie_support = Urllib2. Httpcookieprocessor (Cookiejar)
If EnableProxy:
Proxy_support = Urllib2. Proxyhandler ({' http ': ' Http://xxxxx.pac '}) #使用代理
Opener = Urllib2.build_opener (Proxy_support, Cookie_support, Urllib2. HttpHandler)
Print "Proxy enabled"
Else
Opener = Urllib2.build_opener (Cookie_support, Urllib2. HttpHandler)
Urllib2.install_opener (opener) #构建cookie对应的opener
Enablecookie function is relatively simple
Copy the Code code as follows:
def getservertime (self):
"Get server time and Nonce, which is used to encode the password"
Print "Getting server time and nonce ..."
Serverdata = Urllib2.urlopen (Self.serverurl). Read () #得到网页内容
Print Serverdata
Try
Servertime, Nonce, pubkey, rsakv = Weibosearch.sserverdata (serverdata) #解析得到serverTime, nonce, etc.
Return servertime, Nonce, PubKey, RSAKV
Except
print ' Get server time & nonce error! '
Return None
The functions in the Weibosearch file are primarily used to parse data from the server and are relatively straightforward.
3. Sserverdata function (weibosearch.py):
Copy the Code code as follows:
Import re
Import JSON
def sserverdata (serverdata):
"Search the server time & nonce from server data"
p = re.compile (' \ ((. *) \) ')
Jsondata = P.search (serverdata). Group (1)
data = Json.loads (Jsondata)
Servertime = str (data[' servertime ')
nonce = data[' nonce ']
PubKey = data[' PubKey ']#
RSAKV = data[' rsakv ']#
Print "Server time is:", servertime
Print "Nonce is:", nonce
Return servertime, Nonce, PubKey, RSAKV
The parsing process mainly uses regular expressions and JSON, which is relatively easy to understand. In addition, the parse relocation result part function in login is also shown in this file:
Copy the Code code as follows:
def sredirectdata (text):
p = re.compile (' location\.replace\ ([\ ' "] (. *?) [\'"]\)')
loginurl = P.search (text). Group (1)
print ' loginurl: ', loginurl
Return loginurl
4, from the first step to the second step to encrypt the user and password, encoding operation (weiboencode.py)
Copy the Code code as follows:
Import Urllib
Import Base64
Import RSA
Import Binascii
def postencode (UserName, PassWord, Servertime, Nonce, PubKey, RSAKV):
"Used to generate POST data"
Encodedusername = GetUserName (userName) #用户名使用base64加密
Encodedpassword = Get_pwd (PassWord, Servertime, Nonce, PubKey) #目前密码采用rsa加密
Postpara = {
' Entry ': ' Weibo ',
' Gateway ': ' 1 ',
' From ': ',
' SaveState ': ' 7 ',
' Userticket ': ' 1 ',
' Ssosimplelogin ': ' 1 ',
' VSNF ': ' 1 ',
' Vsnval ': ',
' su ': encodedusername,
' Service ': ' Miniblog ',
' Servertime ': Servertime,
' Nonce ': nonce,
' Pwencode ': ' RSA2 ',
' SP ': Encodedpassword,
' Encoding ': ' UTF-8 ',
' Prelt ': ' 115 ',
' RSAKV ': rsakv,
' URL ': ' Http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack ',
' ReturnType ': ' META '
}
PostData = Urllib.urlencode (Postpara) #网络编码
return PostData
The Postencode function constructs the message body of the post, which requires that the content be constructed to be the same as the information needed to actually log in. The difficulty in encrypting the user name and password means:
Copy the Code code as follows:
def getusername (userName):
"Used to encode user name"
Usernametemp = Urllib.quote (userName)
usernameencoded = base64.encodestring (usernametemp) [:-1]
Return usernameencoded
def get_pwd (password, servertime, nonce, PubKey):
Rsapublickey = Int (PubKey, 16)
Key = RSA. PublicKey (Rsapublickey, 65537) #创建公钥
message = str (servertime) + ' \ t ' + str (nonce) + ' \ n ' + str (password) #拼接明文js加密文件中得到
passwd = rsa.encrypt (message, key) #加密
passwd = Binascii.b2a_hex (passwd) #将加密信息转换为16进制.
return passwd
Sina login process, password encryption method originally is SHA1, now become RSA, later may also change, but a variety of encryption algorithms in Python has a corresponding implementation, as long as the discovery of its encryption method (), the program is easier to implement.
Here, the python simulation landing Sina Weibo success, run the output:
Copy the Code code as follows:
Loginurl:http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinassocontroller.feedbackurlcallback &ssosavestate=1390390056&ticket=st-mzq4nzq5ntyyma==-1387798056-xd-284624bfc19fe242bbae2c39fb3a8ca8 &retcode=0
Login sucess!
If you need to crawl the information in the microblog, then just add the crawl and parse module after the main function, such as reading the content of a Weibo Web page:
Copy the Code code as follows:
Htmlcontent = Urllib2.urlopen (Myurl). Read () #得到myurl网页的所有内容 (HTML)
We can design different crawler modules according to different requirements, and put the code of the simulated landing here.