Python simulates Sina Weibo landing function (Sina Weibo crawler) _python

Source: Internet
Author: User
Tags base64 in python

1. Main function (weibomain.py):

Copy Code code as follows:

Import Urllib2
Import Cookielib

Import Weiboencode
Import Weibosearch

if __name__ = = ' __main__ ':
Weibologin = weibologin (' xxx @gmail. com ', ' xxxx ') #邮箱 (account), password
If weibologin.login () = = True:
Print "Landing success!" "

The first two import is a network programming module that loads Python, followed by an import that loads another two files weiboencode.py and weiboseach.py (described later). The main function creates a new login object and then logs on.

2, Weibologin Class (weibomain.py):

Copy Code code as follows:

Class Weibologin:
def __init__ (self, user, pwd, enableproxy = False):
"Initialize Weibologin,enableproxy indicates whether a proxy server is used, and the default shutdown"

Print "Initializing Weibologin ..."
Self.username = user
Self.password = pwd
Self.enableproxy = EnableProxy

Self.serverurl = "http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback= Sinassocontroller.prelogincallback&su=&rsakt=mod&client=ssologin.js (v1.4.11) &_=1379834957683 "
Self.loginurl = "Http://login.sina.com.cn/sso/login.php?client=ssologin.js (v1.4.11)"
Self.postheader = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; rv:24.0) gecko/20100101 '}

The initialization function, which defines two key URL members: The first step of Self.serverurl for landing (get servertime, nonce, etc.), the first step in essence contains the 1 and 2 of the login process to parse Sina Weibo Self.loginurl is used for the second step (after encrypting the user and password, post to the Url,self.postheader is the header of the post), which corresponds to the resolution of the Sina Weibo login process of 3. There are 3 functions in the class:

Copy Code code as follows:

def Login (self):
"Login Procedure"
Self. Enablecookie (Self.enableproxy) #cookie或代理服务器配置

Servertime, Nonce, pubkey, rsakv = self. GetServerTime () #登陆的第一步
PostData = Weiboencode.postencode (Self.username, Self.password, Servertime, Nonce, PubKey, RSAKV) #加密用户和密码
Print "Post data length:\n", Len (postdata)

        req = urllib2. Request (Self.loginurl, PostData, Self.postheader)
        print "Posting Request ... "
        result = Urllib2.urlopen (req) #登陆的第二步--Analysis of Sina Weibo login process 3
        text = Result.read ()
        try:
            loginurl = weibosearch.sredirectdata (text) #解析重定位结果
              Urllib2.urlopen ( loginurl)
        except:
             print ' Login error! '
            return False

         print ' Login sucess! '
        return True

Self. Enablecookie used to set up cookies and proxy servers, the network has a lot of free proxy server, in order to prevent Sina IP, you can use. Then make the first step of landing, access to the Sina server to get information such as Servertime, and then use this information to encrypt the user name and password, build POST request; Send the user and password to the Self.loginurl, get the relocation information, resolve to get the final jump to the URL, open the URL, the server will automatically write the user login information to the cookie, landing successfully.

Copy Code code as follows:

def enablecookie (self, enableproxy):
"Enable Cookie & Proxy (if needed)."

Cookiejar = Cookielib. Lwpcookiejar () #建立cookie
Cookie_support = Urllib2. Httpcookieprocessor (Cookiejar)

If EnableProxy:
Proxy_support = Urllib2. Proxyhandler ({' http ': ' Http://xxxxx.pac '}) #使用代理
Opener = Urllib2.build_opener (Proxy_support, Cookie_support, Urllib2. HttpHandler)
print ' Proxy enabled '
Else
Opener = Urllib2.build_opener (Cookie_support, Urllib2. HttpHandler)

Urllib2.install_opener (opener) #构建cookie对应的opener

Enablecookie function is relatively simple

Copy Code code as follows:

def getservertime (self):
"Get server time and nonce, which are used to encode the password"

Print "Getting server time and nonce ..."
Serverdata = Urllib2.urlopen (Self.serverurl). Read () #得到网页内容
Print Serverdata

Try
Servertime, Nonce, pubkey, rsakv = Weibosearch.sserverdata (serverdata) #解析得到serverTime, nonce etc.
Return servertime, Nonce, PubKey, RSAKV
Except
print ' Get server time & nonce error! '
Return None

The functions in the Weibosearch file are mainly used to resolve the data obtained from the server, which is relatively simple.

3, Sserverdata function (weibosearch.py):

Copy Code code as follows:

Import re
Import JSON

def sserverdata (serverdata):
"Search the server time & nonce from server data"

p = Re.compile (' (. *) \) ')
Jsondata = P.search (serverdata). Group (1)
data = Json.loads (Jsondata)
Servertime = str (data[' servertime '))
nonce = data[' nonce ']
PubKey = data[' PubKey ']#
RSAKV = data[' rsakv ']#
Print "Server time is:", servertime
Print "Nonce is:", Nonce
Return servertime, Nonce, PubKey, RSAKV

The parsing process mainly uses regular expressions and JSON, which is easier to understand. In addition, the parse relocation result part function in login is also as follows in this file:

Copy Code code as follows:

def sredirectdata (text):
p = re.compile (' location\.replace\ [\ ']] (. *?) [\'"]\)')
loginurl = P.search (text). Group (1)
print ' loginurl: ', loginurl
Return loginurl

4, from the first step to the second step to encrypt the user and password, encoding operation (weiboencode.py)

Copy Code code as follows:

Import Urllib
Import Base64
Import RSA
Import Binascii

def postencode (UserName, PassWord, Servertime, Nonce, PubKey, RSAKV):
"Used to generate POST data"

Encodedusername = GetUserName (userName) #用户名使用base64加密
Encodedpassword = Get_pwd (PassWord, Servertime, Nonce, PubKey) #目前密码采用rsa加密
Postpara = {
' Entry ': ' Weibo ',
' Gateway ': ' 1 ',
' From ': ',
' SaveState ': ' 7 ',
' Userticket ': ' 1 ',
' Ssosimplelogin ': ' 1 ',
' VSNF ': ' 1 ',
' Vsnval ': ',
' su ': encodedusername,
' Service ': ' Miniblog ',
' Servertime ': Servertime,
' Nonce ': nonce,
' Pwencode ': ' RSA2 ',
' SP ': Encodedpassword,
' Encoding ': ' UTF-8 ',
' Prelt ': ' 115 ',
' RSAKV ': rsakv,
' URL ': ' Http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack ',
' ReturnType ': ' META '
}
PostData = Urllib.urlencode (Postpara) #网络编码
return PostData

The Postencode function constructs the message body of the post, requiring that the content be the same as the information needed to actually log in. Difficulty in encrypting user name and password:

Copy Code code as follows:

def getusername (userName):
"Used to encode user name"

Usernametemp = Urllib.quote (userName)
usernameencoded = base64.encodestring (usernametemp) [:-1]
Return usernameencoded


def get_pwd (password, servertime, nonce, PubKey):
Rsapublickey = Int (PubKey, 16)
Key = RSA. PublicKey (Rsapublickey, 65537) #创建公钥
message = str (servertime) + ' \ t ' + str (nonce) + ' \ n ' + str (password) #拼接明文js加密文件中得到
passwd = rsa.encrypt (message, key) #加密
passwd = Binascii.b2a_hex (passwd) #将加密信息转换为16进制.
return passwd

Sina login process, password encryption is originally SHA1, now become RSA, may also change, but a variety of cryptographic algorithms in Python have a corresponding implementation, as long as the discovery of its encryption method (), the program is easier to implement.

Here, Python analog landing Sina Weibo is successful, run the output:

Copy Code code as follows:

Loginurl:http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinassocontroller.feedbackurlcallback &ssosavestate=1390390056&ticket=st-mzq4nzq5ntyyma==-1387798056-xd-284624bfc19fe242bbae2c39fb3a8ca8 &retcode=0
Login sucess!

If you need to crawl the information in the microblog, then just add the crawl and parse module after the main function, such as reading the content of a Twitter Web page:

Copy Code code as follows:

Htmlcontent = Urllib2.urlopen (Myurl). Read () #得到myurl网页的所有内容 (HTML)

You can design different crawler modules according to different requirements, and simulate the landing code here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.