How can I write a crawler to answer all the likes and user names?

Last Update:2018-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I asked a person to think about fo, but she didn't tell me the result. Now I think it's a bit strange and curious. Fortunately, I know a few questions she liked, I want to use the social engineering method to find out her zhihu account. (I can't ask for an invitation when I renew my account, even if she doesn't come to this zone.) I asked a person to know the account and wanted to fo. As a result, she didn't tell me. Now I think it's a bit strange and curious, fortunately, I know a few issues that she liked. I want to use the social engineering method to find out her zhihu account. (She shouldn't come to this area if she doesn't have an invitation.) reply: Every response p contains something called data-aid = "12345678,
Then, according to www.zhihu.com/answer/12345678/voters_profile? & Amp; offset = 10
This json Data Connection Analysis connects all the likes and individuals. Each 10 likes is a json connection.

After trying it out, You need to log in to obtain the complete data. For details about how to log in, refer to my blog.
Python simulated login knowledge

For example, data-aid = '000000'

#! /Usr/bin/env python #-*-coding: UTF-8-*-import requestsfrom bs4 import BeautifulSoupimport timeimport jsonimport osimport sysurl =' http://www.zhihu.com 'Loginurl =' http://www.zhihu.com /Login/email 'headers = {"User-Agent": 'mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv: 41.0) Gecko/20100101 Firefox/123456 ', "Referer ":" http://www.zhihu.com /", 'Host': 'www .zhihu.com ',} data = {'email': 'xxxxx @ gmail.com ', 'Password': 'xxxxxxx', 'rememberme ': "true",} s = requests. session () # if you have successfully logged on, use the stored cookies to log on to if OS. path. exists ('cookiefile'): with open ('cookiefile') as f: cookie = json. load (f) s. cookies. update (cookie) req1 = s. get (url, headers = headers) with open('zhihu.html ', 'w') as f: f. write (req1.content) # manually enter the Verification Code for the first time to log on to else: req = s. get (url, headers = headers) print req soup = BeautifulSoup (req. text, "html. parser ") xsrf = soup. find ('input', {'name': '_ xsrf', 'type': 'ddd '}). get ('value') data ['_ xsrf'] = xsrf timestamp = int (time. time () * 1000) captchaURL =' http://www.zhihu.com /Captcha.gif? = '+ Str (timestamp) print captchaURL with open('zhihucaptcha.gif', 'wb') as f: captchaREQ = s. get (captchaURL) f. write (captchaREQ. content) loginCaptcha = raw_input ('input captcha: \ n '). strip () data ['captcha '] = loginCaptcha # print data loginREQ = s. post (loginURL, headers = headers, data = data) # print loginREQ. url # print s. cookies. get_dict () if not loginREQ. json () ['R']: # print loginREQ. json () with op En ('cookiefile', 'wb ') as f: json. dump (s. cookies. get_dict (), f) else: print 'login failed, try again! 'Sys. exit (1) # Use http://www.zhihu.com /Question/27621722/answer/48820436 the 399 likes of the great gods are examples. zanBaseURL =' http://www.zhihu.com /Answer/22229844/voters_profile? & Offset = {0} 'page = 0 count = 0 while 1: zanURL = zanBaseURL. format (str (page) page + = 10 zanREQ = s. get (zanURL, headers = headers) zanData = zanREQ. json () ['payload'] if not zanData: break for item in zanData: # print item zanSoup = BeautifulSoup (item, "html. parser ") zanInfo = zanSoup. find ('A', {'target': "_ blank", 'class': 'zg-link'}) if zanInfo: print 'nickname: ', zanInfo. get ('title'), '', print 'person _ url: ', zanInfo. get ('href ') else: anonymous = zanSoup. find ('img ', {'title': True, 'class': "zm-item-img-avatar"}) print 'nickname:', anonymous. get ('title') count + = 1 print count

There's a Python 3 Project Zhihu-py3 7 sDream/zhihu-py3-GitHub

It encapsulates the needs of zhihu crawlers in various aspects, such as getting user information, getting question information, getting answer information, and so on ...... Of course, it also includes thumb ups for users. Although it is a single-thread synchronization type, it can be used normally.

Here is her document: Welcome to zhihu-py3's documentation!

Welcome to Star and Fork or contribute code.

===

It's easy to get likes.

From zhihu import ZhihuClientclient = ZhihuClient ('cookies. json ') url = 'HTTP: // www.zhihu.com/question/36337920/answer/67029821'answer = client. answer (url) print ('problem: {0 }'. format (answer. question. title) print ('Primary answer: {0 }'. format (answer. author. name) print ('total {0} likes: \ n '. format (answer. upvote_num) for upvoter in answer. upvoters: print (upvoter. name, upvoter. url)

Let's look at the comments in the first answer and add some ideas on how to find the key feature of aid:

An overview: analyze the HTTP Request/Response sent during manual operations to find out key positioning features.
Tool: firebug

1. Click the hyperlink below any answer page and wait for a person to agree and find that
Http://www.zhihu.com/answer/22229844/voters_profile
URL to send data.
From the URL format, it is easy to guess that this is the information of the voter who voted for answer 22229844. The Response returned by the server (A Piece Of JSON data) can also be used to describe this. So long as we can send a GET request to this URL, we can know the voter. The rest is to solve the problem of how to find this URL, that is, to find this 22229844.
2. As we know that clicking and waiting for approval will trigger an Ajax request to send this URL, the 22229844 will either be stored in the DOM or calculated. In this case, search for the 22229844 string in the DOM and you can easily find such a p:

What do you mean by social engineering?
As far as I know, the so-called social engineering is generally a hacker's scam. It uses known information to obtain the information it wants, but the core is fraud.
Now you know which answer she likes and what is the relationship with social engineering?
Do you want to say that you know multiple answers she has clicked, and you want to find her from those who have liked them at the same time?
If you are lucky, you may find it at once. If you are lucky, I am afraid a bunch of candidates will be waiting for you. The key is that you know how many answers she likes.
On the technical side, I don't think it's enough to use python to write js and run it on the console. Find the wheel brother. He has the source code. The wheel brother crawls the user to automatically analyze the source code worth attention.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How can I write a crawler to answer all the likes and user names?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How can I write a crawler to answer all the likes and user names?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support