How can I write a crawler to answer all the likes and user names?

Source: Internet
Author: User
I asked a person to think about fo, but she didn't tell me the result. Now I think it's a bit strange and curious. Fortunately, I know a few questions she liked, I want to use the social engineering method to find out her zhihu account. (I can't ask for an invitation when I renew my account, even if she doesn't come to this zone.) I asked a person to know the account and wanted to fo. As a result, she didn't tell me. Now I think it's a bit strange and curious, fortunately, I know a few issues that she liked. I want to use the social engineering method to find out her zhihu account. (She shouldn't come to this area if she doesn't have an invitation.) reply: Every response p contains something called data-aid = "12345678,
Then, according to www.zhihu.com/answer/12345678/voters_profile? & Amp; offset = 10
This json Data Connection Analysis connects all the likes and individuals. Each 10 likes is a json connection.

After trying it out, You need to log in to obtain the complete data. For details about how to log in, refer to my blog.
Python simulated login knowledge


For example, data-aid = '000000'

#! /Usr/bin/env python #-*-coding: UTF-8-*-import requestsfrom bs4 import BeautifulSoupimport timeimport jsonimport osimport sysurl =' http://www.zhihu.com 'Loginurl =' http://www.zhihu.com /Login/email 'headers = {"User-Agent": 'mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv: 41.0) Gecko/20100101 Firefox/123456 ', "Referer ":" http://www.zhihu.com /", 'Host': 'www .zhihu.com ',} data = {'email': 'xxxxx @ gmail.com ', 'Password': 'xxxxxxx', 'rememberme ': "true",} s = requests. session () # if you have successfully logged on, use the stored cookies to log on to if OS. path. exists ('cookiefile'): with open ('cookiefile') as f: cookie = json. load (f) s. cookies. update (cookie) req1 = s. get (url, headers = headers) with open('zhihu.html ', 'w') as f: f. write (req1.content) # manually enter the Verification Code for the first time to log on to else: req = s. get (url, headers = headers) print req soup = BeautifulSoup (req. text, "html. parser ") xsrf = soup. find ('input', {'name': '_ xsrf', 'type': 'ddd '}). get ('value') data ['_ xsrf'] = xsrf timestamp = int (time. time () * 1000) captchaURL =' http://www.zhihu.com /Captcha.gif? = '+ Str (timestamp) print captchaURL with open('zhihucaptcha.gif', 'wb') as f: captchaREQ = s. get (captchaURL) f. write (captchaREQ. content) loginCaptcha = raw_input ('input captcha: \ n '). strip () data ['captcha '] = loginCaptcha # print data loginREQ = s. post (loginURL, headers = headers, data = data) # print loginREQ. url # print s. cookies. get_dict () if not loginREQ. json () ['R']: # print loginREQ. json () with op En ('cookiefile', 'wb ') as f: json. dump (s. cookies. get_dict (), f) else: print 'login failed, try again! 'Sys. exit (1) # Use http://www.zhihu.com /Question/27621722/answer/48820436 the 399 likes of the great gods are examples. zanBaseURL =' http://www.zhihu.com /Answer/22229844/voters_profile? & Offset = {0} 'page = 0 count = 0 while 1: zanURL = zanBaseURL. format (str (page) page + = 10 zanREQ = s. get (zanURL, headers = headers) zanData = zanREQ. json () ['payload'] if not zanData: break for item in zanData: # print item zanSoup = BeautifulSoup (item, "html. parser ") zanInfo = zanSoup. find ('A', {'target': "_ blank", 'class': 'zg-link'}) if zanInfo: print 'nickname: ', zanInfo. get ('title'), '', print 'person _ url: ', zanInfo. get ('href ') else: anonymous = zanSoup. find ('img ', {'title': True, 'class': "zm-item-img-avatar"}) print 'nickname:', anonymous. get ('title') count + = 1 print count
There's a Python 3 Project Zhihu-py3 7 sDream/zhihu-py3-GitHub

It encapsulates the needs of zhihu crawlers in various aspects, such as getting user information, getting question information, getting answer information, and so on ...... Of course, it also includes thumb ups for users. Although it is a single-thread synchronization type, it can be used normally.

Here is her document: Welcome to zhihu-py3's documentation!

Welcome to Star and Fork or contribute code.

===

It's easy to get likes.

From zhihu import ZhihuClientclient = ZhihuClient ('cookies. json ') url = 'HTTP: // www.zhihu.com/question/36337920/answer/67029821'answer = client. answer (url) print ('problem: {0 }'. format (answer. question. title) print ('Primary answer: {0 }'. format (answer. author. name) print ('total {0} likes: \ n '. format (answer. upvote_num) for upvoter in answer. upvoters: print (upvoter. name, upvoter. url)
Let's look at the comments in the first answer and add some ideas on how to find the key feature of aid:

An overview: analyze the HTTP Request/Response sent during manual operations to find out key positioning features.
Tool: firebug

1. Click the hyperlink below any answer page and wait for a person to agree and find that
Http://www.zhihu.com/answer/22229844/voters_profile
URL to send data.
From the URL format, it is easy to guess that this is the information of the voter who voted for answer 22229844. The Response returned by the server (A Piece Of JSON data) can also be used to describe this. So long as we can send a GET request to this URL, we can know the voter. The rest is to solve the problem of how to find this URL, that is, to find this 22229844.
2. As we know that clicking and waiting for approval will trigger an Ajax request to send this URL, the 22229844 will either be stored in the DOM or calculated. In this case, search for the 22229844 string in the DOM and you can easily find such a p:

What do you mean by social engineering?
As far as I know, the so-called social engineering is generally a hacker's scam. It uses known information to obtain the information it wants, but the core is fraud.
Now you know which answer she likes and what is the relationship with social engineering?
Do you want to say that you know multiple answers she has clicked, and you want to find her from those who have liked them at the same time?
If you are lucky, you may find it at once. If you are lucky, I am afraid a bunch of candidates will be waiting for you. The key is that you know how many answers she likes.
On the technical side, I don't think it's enough to use python to write js and run it on the console. Find the wheel brother. He has the source code. The wheel brother crawls the user to automatically analyze the source code worth attention.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.