Scrapy crawling Sina Weibo

Source: Internet
Author: User

1 This program is marked

The daily crawl is the basic information of Sina Weibo users, such as user nickname, Avatar, User's attention, fan list to

and posted tweets, etc., which are captured and saved to MongoDB.

2. How to Achieve:

Take a few big V of Weibo as a starting point, crawl through their fans and watchlist, then get fans and watchlist followers and watchlist, and so on, so that recursive crawls can be achieved. If a user is associated with another user on a social network, their information is crawled by the crawler so that we can crawl all users. In this way, we can get the user's unique ID, and then according to the ID for each user to publish the Weibo.

3. Analysis

The crawl site is: https://m.weibo.cn, which is the site of the Weibo mobile terminal. Opening the site jumps to the sign-in page because the home page has a login limit. However, we can open a User details page directly

Sina Weibo's anti-crawling ability is very strong, if you do not log in and directly request the Micro-blog API interface, which is very easy to lead to 403 status code. So here we implement a middleware that adds random Cookies to each request.

Microblogging also has an anti-crawl measure is the detection of the same IP request is too large when the 414 status code will appear. You can switch agents if you encounter such a situation.

Scrapy crawling Sina Weibo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.