Sample Code for logging on to Sina Weibo using a python Machine

Last Update:2014-05-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I started to learn Python some time ago. I never thought of any good small projects that I could do. I was so anxious that I had to find a way to crawl Sina Weibo, A simple statistical item for crawling data. At first, I thought I had learned some Python Regular Expressions and I was able to deal with it. The clinker planted a heel on the machine login and obtained data from the preliminary login, it took four or five days. I have never done any machine login code before, so the initial completion of this project is entirely attributed to some of the great online players. I just picked up some of them and pieced together the code of some great gods, add several lines of comments. Copy the Code # import. Note that only one rsa module needs to be installed, and others require built-in import re and urllib. parse, urllib. request, http. cookiejar, base64, binascii, rsa # The following four lines of code make it simple to enable all your next get and post requests to carry the obtained cookies, because the login verification for larger websites depends on cookiecj = http. cookiejar. LWPCookieJar () cookie_support = urllib. request. HTTPCookieProcessor (cj) opener = urllib. request. build_opener (cookie_support, urllib. request. HTTPHandler) urllib. request. install_ope Ner (opener) # encapsulate a function for get. Sina Weibo's get content encoding is-8, so UTF-8 is written to the end, in real projects, it is recommended that def getData (url) be determined based on actual content encoding: request = urllib. request. request (url) response = urllib. request. urlopen (request) text = response. read (). decode ('utf-8') return text # encapsulate a function for post, and verify that both the password and user name are post, therefore, this postData is used in this demo to verify the username and password def postData (url, data): # headers needs to simulate headers = {'user-agent' by ourselves ': 'The Mozilla/5.0 (compatible; MSIE 9. 0; Windows NT 6.1; WOW64; Trident/5.0) '} # The urlencode here is used to concatenate a request object using, then encode it into UTF-8 data = urllib. parse. urlencode (data ). encode ('utf-8') request = urllib. request. request (url, data, headers) response = urllib. request. urlopen (request) text = response. read (). decode ('gbk') return text def login_weibo (nick, pwd ): #==================================== get servertime, pcid, pubkey, rsakv ============================== ========# For a pre-login request, obtain several parameters prelogin_url = 'HTTP: // login.sina.com.cn/sso/prelogin.php? Entry = weibo & callback = sinaSSOController. preloginCallBack & su = % s & rsakt = mod & checkpin = 1 & client = ssologin. js (v1.4.15) & _ = 1400822309846 '% nick preLogin = getData (prelogin_url) # The following four values are servertime = re. findall ('"servertime ":(. *?), ', PreLogin) [0] pubkey = re. findall (' "pubkey ":"(.*?) ", ', PreLogin) [0] rsakv = re. findall ('" rsakv ":"(.*?) ", ', PreLogin) [0] nonce = re. findall ('" nonce ":"(.*?) ", ', PreLogin) [0] #==================== encrypt the user name and password ========================== # Good, you have already come to the most difficult part of Sina Weibo login. If you don't give me some advice on this part, it would be too difficult. I don't want to say anything about it. It's all about encryption, finally, the encrypted su and sp su = base64.b64encode (bytes (urllib. request. quote (nick), encoding = 'utf-8') rsaPublickey = int (pubkey, 16) key = rsa. publicKey (rsaPublickey, 65537) # In my articles I found online, some articles do not bytes the concatenated strings, this is the new method of python3. It seems to be. Rsa. encrypt requires a byte parameter, which is different from the previous one. In fact, the above base64.b64encode is also the same as the message = bytes (str (servertime) + '\ t' + str (nonce) +' \ n' + str (pwd ), encoding = 'utf-8') sp = binascii. b2a_hex (rsa. encrypt (message, key )) #=================================================== ========## param is an exciting post login parameter, this parameter uses the data obtained in the first step. Not much can be said: param = {'entry ': 'weibo', 'Gateway': 1, 'from ': '', 'savestate': 7, 'useticket ': 1, 'pagerefer': 'http: // login.sina.com.cn /Sso/logout. php? Entry = miniblog & r = http % 3A % 2F % 2Fweibo.com % 2Flogout. php % 3 Fbackurl % 3D ', 'vsnf': 1, 'su ': su, 'service': 'miniblog', 'servertime': servertime, 'nonce': nonce, 'pwencode': 'rsa2', 'rsak': rsakv, 'SP ': sp, 'sr': '2017 * 66661', 'encoding': 'utf-8 ', 'prelt': 961, 'url': 'http: // weibo.com/ajaxlogin.php? Framelogin = 1 & callback = parent. sinaSSOController. feedBackUrlCallBack '} # This is the only place where postData is used. It is also very simple: s = postData? Client = ssologin. js (v1.4.15) ', param) # Well, when your code is executed here, most of it has been completed, however, many crawler shoes are planted here like me, if you skip this step and directly execute these lines of code to get fans, you will find that what you get is still the page that allows you to log on. It's really depressing, I planted it here for one day. # well, let's continue. This urll is a url for further login defined in a script returned by Sina after login. Parameters and verification are also obtained before. This step is the real login, so you need to get this urll again and use get to log on to urll = re. findall ("location. replace \(\'(. *?) \ '\); ", S) [0] getData (urll) #=================================================== ====## if you haven't skipped the urll that just came here, congratulations! Now that you have succeeded, it's time for you to crawl through Sina Weibo and get any data you want! # If you try to get your own Weibo homepage, you will find that it is a file of several hundred kb in size. text = getData ('HTTP: // weibo.com/527891819/home? Wvr = 5 & lf = reg ') fp = open('yeah.txt', 'w', encoding = 'utf-8') fp. write (text) fp. close ()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sample Code for logging on to Sina Weibo using a python Machine

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support