Python-based zhihu_oauth and pythonzhihu

Python-based zhihu_oauth and pythonzhihu_oauth

Last Update:2017-05-03 Source: Internet

Author: User

Tags oauth

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python-based zhihu_oauth and pythonzhihu_oauth

Today, I accidentally found a well-known open-source crawler based on Python named zhihu_oauth. I have read a lot of stars on github, it seems that the document is quite detailed, so I did a little research. I found it useful. Here we will introduce how to use it.

The home page of the Project is https://github.com/7sdream/zhihu-oauth. The author's zhihu homepage is: https://www.zhihu.com/lele/7sdream /.

The document address for the project is: http://zhihu-oauth.readthedocs.io/zh_CN/latest/index.html. The original author has already explained how to use this database in great detail. I will repeat it here to make it a perfect addition. So if you want to learn more about how to use this database, go to the official documentation. I just want to talk about the important points that I think need to be supplemented.

First, install. The author has uploaded the project to pypi, so we can directly install it using pip. According to the author, the project provides better support for Python3 and is currently compatible with Python2. Therefore, you 'd better use python3. directly use pip3 install-U zhihu_oauth to install it.

The first step after installation is to log on. You can log on directly using the following code.

1 from zhihu_oauth import ZhihuClient 2 from zhihu_oauth.exception import NeedCaptchaException 3 client = ZhihuClient () 4 user = 'email _ or_phone '5 pwd = 'Password' 6 try: 7 client. login (user, pwd) 8 print (u "login successful! ") 9 forbidden t NeedCaptchaException: # handle the situation where the verification code is needed 10 # Save the verification code and prompt to enter it. Log On again 11 with open('a.gif ', 'wb') as f: 12 f. write (client. get_captcha () 13 captcha = input ('Please input captcha: ') 14 client. login ('email _ or_phone ', 'Password', captcha) 15 16 client. save_token ('token. pkl ') # Save token17 # with token, you can directly load the token file next time you log on. 18 # client. load_token ('filename ')

The code above is to log in directly using the account and password, and finally save the token after login, we can directly use the token to log on next time instead of entering the password every time.

After logging on, you can do a lot of things. For example, the following code can obtain basic information about your account.

1 from _ future _ import print_function # Use the print method of python3 2 from zhihu_oauth import ZhihuClient 3 4 client = ZhihuClient () 5 client. load_token ('token. pkl') # load the token file 6 # display your own information 7 me = client. me () 8 9 # obtain the last five answers 10 for _, answer in zip (range (5), me. answers): 11 print (answer. question. title, answer. voteup_count) 12 13 print ('----------') 14 15 # obtain the top five answers for the likes: 16 for _, answer in zip (range (5), me. answers. order_by ('votenum'): 17 print (answer. question. title, answer. voteup_count) 18 19 print ('----------') 20 21 # obtain the last five questions 22 for _, question in zip (range (5), me. questions): 23 print (question. title, question. answer_count) 24 25 print ('----------') 26 27 # obtain the five most recently published articles 28 for _, article in zip (range (5), me. articles): 29 print (article. title, article. voteup_count)

Of course, we can do more than that. For example, if we know the url address or question id of a question, we can obtain the number of answers to this question, the author's information and a series of detailed information. Developers think very well. Generally, all the common information is included. I will not post the specific code. You can refer to the official documentation on your own.

A small tips: Because this library has many classes, such as the class that obtains the author information and the class that obtains the article information, etc. Each class has many methods. I went to the official document and I did not list all the attributes of some classes. How can I view all the attributes of this class? In fact, it is very simple. You only need to use the dir function of python. You can use dir (object) to view all the attributes of the object class (or object. For example, if we have an answer Class Object, using dir (answer) will return the list of all attributes of the answer object. Apart from the default attributes, we can find the attributes we need for this class, which is very convenient. (The following shows all the attributes of the collection class)

['_ Class _', '_ delattr _', '_ dict _', '_ doc __', '_ format _', '_ getattribute _', '_ hash _', '_ init _', '_ module __', '_ new _', '_ reduce _', '_ performance_ex _', '_ repr _', '_ setattr __', '_ sizeof _', '_ str _', '_ subclasshook _', '_ weakref _', '_ build_data ', '_ build_params', '_ build_url', '_ cache',' _ data', '_ get_data', '_ id',' _ method', '_ refresh_times ', '_ session', 'answer _ count', 'answers', 'articles', 'comment _ count', 'comments', 'contents', 'created _ time ', 'creator', 'description', 'follower _ count', 'followers', 'id', 'is _ public', 'ure _ data', 'refresh ', 'title', 'updated _ time']

Finally, I used this class to capture all the images in the answers to a certain question (capture the beauty map, hahahahaha) and used less than 30 lines of code (remove comments ). Share with you.

1 #! /Usr/bin/env python 2 #-*-coding: UTF-8-*-3 # @ Time: 4 # @ Author: Lyrichu 5 # @ Email: 919987476@qq.com 6 # @ File: save_images.py 7''' 8 @ Description: save the picture 9 '''10 from _ ure _ import print_function for all answers to a question # Use the print method of python3 11 from zhihu_oauth import ZhihuClient12 import re13 import os14 import urllib15 16 client = ZhihuClient () 17 # log on to the 18 client. load_token ('token. pkl ') # Load token file 19 id = 24400664 # https://www.zhihu.com/question/24400664 (nice looking is a kind of experience) 20 question = client. question (id) 21 print (u "question:", question. title) 22 print (u "number of answers:", question. answer_count) 23 # create a folder for storing images 24 OS. mkdir (question. title + u "(image)") 25 path = question. title + u "(image)" 26 index = 1 # image No. 27 for answer in question. answers: 28 content = answer. content # answer 29 re_compile = re. compile (R'  ') 30 img_lists = re. findall (re_compile, content) 31 if (img_lists): 32 for img in img_lists: 33 img_url = img [0] # image url34 urllib. urlretrieve (img_url, path + u "/mongod.jpg" % index) 35 print (u "saved the % d image" % index) 36 index + = 1

If you write it on your own, you will not be able to get all the answers by directly capturing the parsing Web page. Therefore, you can only crack the zhihu api, Which is troublesome. It is much easier to use this ready-made wheel. In the future, if you want to appreciate the beauty you know, you don't have to worry about it anymore.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More