Sample Code for logging on to Sina Weibo using a python Machine

Source: Internet
Author: User

I started to learn Python some time ago. I never thought of any good small projects that I could do. I was so anxious that I had to find a way to crawl Sina Weibo, A simple statistical item for crawling data. At first, I thought I had learned some Python Regular Expressions and I was able to deal with it. The clinker planted a heel on the machine login and obtained data from the preliminary login, it took four or five days. I have never done any machine login code before, so the initial completion of this project is entirely attributed to some of the great online players. I just picked up some of them and pieced together the code of some great gods, add several lines of comments. Copy the Code # import. Note that only one rsa module needs to be installed, and others require built-in import re and urllib. parse, urllib. request, http. cookiejar, base64, binascii, rsa # The following four lines of code make it simple to enable all your next get and post requests to carry the obtained cookies, because the login verification for larger websites depends on cookiecj = http. cookiejar. LWPCookieJar () cookie_support = urllib. request. HTTPCookieProcessor (cj) opener = urllib. request. build_opener (cookie_support, urllib. request. HTTPHandler) urllib. request. install_ope Ner (opener) # encapsulate a function for get. Sina Weibo's get content encoding is-8, so UTF-8 is written to the end, in real projects, it is recommended that def getData (url) be determined based on actual content encoding: request = urllib. request. request (url) response = urllib. request. urlopen (request) text = response. read (). decode ('utf-8') return text # encapsulate a function for post, and verify that both the password and user name are post, therefore, this postData is used in this demo to verify the username and password def postData (url, data): # headers needs to simulate headers = {'user-agent' by ourselves ': 'The Mozilla/5.0 (compatible; MSIE 9. 0; Windows NT 6.1; WOW64; Trident/5.0) '} # The urlencode here is used to concatenate a request object using, then encode it into UTF-8 data = urllib. parse. urlencode (data ). encode ('utf-8') request = urllib. request. request (url, data, headers) response = urllib. request. urlopen (request) text = response. read (). decode ('gbk') return text def login_weibo (nick, pwd ): #==================================== get servertime, pcid, pubkey, rsakv ============================== ========# For a pre-login request, obtain several parameters prelogin_url = 'HTTP: // login.sina.com.cn/sso/prelogin.php? Entry = weibo & callback = sinaSSOController. preloginCallBack & su = % s & rsakt = mod & checkpin = 1 & client = ssologin. js (v1.4.15) & _ = 1400822309846 '% nick preLogin = getData (prelogin_url) # The following four values are servertime = re. findall ('"servertime ":(. *?), ', PreLogin) [0] pubkey = re. findall (' "pubkey ":"(.*?) ", ', PreLogin) [0] rsakv = re. findall ('" rsakv ":"(.*?) ", ', PreLogin) [0] nonce = re. findall ('" nonce ":"(.*?) ", ', PreLogin) [0] #==================== encrypt the user name and password ========================== # Good, you have already come to the most difficult part of Sina Weibo login. If you don't give me some advice on this part, it would be too difficult. I don't want to say anything about it. It's all about encryption, finally, the encrypted su and sp su = base64.b64encode (bytes (urllib. request. quote (nick), encoding = 'utf-8') rsaPublickey = int (pubkey, 16) key = rsa. publicKey (rsaPublickey, 65537) # In my articles I found online, some articles do not bytes the concatenated strings, this is the new method of python3. It seems to be. Rsa. encrypt requires a byte parameter, which is different from the previous one. In fact, the above base64.b64encode is also the same as the message = bytes (str (servertime) + '\ t' + str (nonce) +' \ n' + str (pwd ), encoding = 'utf-8') sp = binascii. b2a_hex (rsa. encrypt (message, key )) #=================================================== ========## param is an exciting post login parameter, this parameter uses the data obtained in the first step. Not much can be said: param = {'entry ': 'weibo', 'Gateway': 1, 'from ': '', 'savestate': 7, 'useticket ': 1, 'pagerefer': 'http: // login.sina.com.cn /Sso/logout. php? Entry = miniblog & r = http % 3A % 2F % 2Fweibo.com % 2Flogout. php % 3 Fbackurl % 3D ', 'vsnf': 1, 'su ': su, 'service': 'miniblog', 'servertime': servertime, 'nonce': nonce, 'pwencode': 'rsa2', 'rsak': rsakv, 'SP ': sp, 'sr': '2017 * 66661', 'encoding': 'utf-8 ', 'prelt': 961, 'url': 'http: // weibo.com/ajaxlogin.php? Framelogin = 1 & callback = parent. sinaSSOController. feedBackUrlCallBack '} # This is the only place where postData is used. It is also very simple: s = postData? Client = ssologin. js (v1.4.15) ', param) # Well, when your code is executed here, most of it has been completed, however, many crawler shoes are planted here like me, if you skip this step and directly execute these lines of code to get fans, you will find that what you get is still the page that allows you to log on. It's really depressing, I planted it here for one day. # well, let's continue. This urll is a url for further login defined in a script returned by Sina after login. Parameters and verification are also obtained before. This step is the real login, so you need to get this urll again and use get to log on to urll = re. findall ("location. replace \(\'(. *?) \ '\); ", S) [0] getData (urll) #=================================================== ====## if you haven't skipped the urll that just came here, congratulations! Now that you have succeeded, it's time for you to crawl through Sina Weibo and get any data you want! # If you try to get your own Weibo homepage, you will find that it is a file of several hundred kb in size. text = getData ('HTTP: // weibo.com/527891819/home? Wvr = 5 & lf = reg ') fp = open('yeah.txt', 'w', encoding = 'utf-8') fp. write (text) fp. close ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.