I have always wanted to use crawlers to land and crawl Twitter data, tried scrapy,requests and other packages, are not successful, may be I am not familiar with the reasons, but
A new package, Tweepy, was discovered today, dedicated to handling the Twitter API in Python. Try the first example of the tutorial first, after a bit of your own modification
The code is as follows:
Tweepy Crawl Twitter Data 1 import re import tweepy auth = tweepy. Oauthhandler ("xxxxx", "xxxxx") Auth.set_access_token ("xxxxx", "xxxxx") API = Tweepy. API (auth) highpoints = re.compile (U ' [\ud800-\udbff][\udc00-\udfff] ') public_tweets = Api.home_timeline () num = 0 for tweets in public_tweets: print num num + = 1 text_noem = highpoints.sub ('--emoji--', t Weet.text) text_noem = Text_noem.encode (' UTF8 ')
Code Explanation:
3–4 Line: Import the Tweepy and re modules. The reason why this simple code to use re is because in the process of extracting tweets encountered emoji expression, and emoji Unicode can not be encoded into GBK, so use regular expressions to replace all expressions.
第6-9: Set the API and token, this need to register after apps.twitter.com new application after the acquisition.
Line 11th: Returns the API object according to Auth for specific return responses
Line 14th: Set the regular expression of emoji expression, used to filter out all the expressions, here refer to a StackOverflow article as noted below.
Line 15th: Get information on the user's timeline
Line 16th: Set a variable for a count
Line 17th: Traverse all tweets:
Inside the Loop:
第18-22 line: Output sequence number, and output tweet content, all the emoji Unicode with '--emoji--' substitution and Unicode encoding as UTF8 to solve the problem can not output.
The point of crawling Twitter data is that Twitter requires all requets to be OAuth certified, and Tweepy This package makes authentication easy.
Reference documents:
Http://stackoverflow.com/questions/13729638/how-can-i-filter-emoji-characters-from-my-input-so-i-can-save-in-mysql-5-5
tweepy 3.5.0 Doc (1) Getting startedStart Introduction
If you are in contact with Tweepy for the first time, please start here. The goal of this tutorial is to provide you with the information you need to learn tweepy, so that you will be proficient in using tweepy after completing this tutorial. We're mainly talking about important basics, not too much detail,
Hello, tweepy.
[Python]View PlainCopy
- Import Tweepy
- Auth = tweepy. Oauthhandler (Consumer_key, Consumer_secret)
- Auth.set_access_token (Access_token, Access_token_secret)
- API = Tweepy. API (auth)
- Public_tweets = Api.home_timeline ()
- For tweets in public_tweets:
- Print Tweet.text
This example downloads the tweets on your Twitter homepage and prints the corresponding text to the console. Twitter requires all requests (requests) to be authorized via the OAuth protocol (identity authentication). Authentication Tutorial (Identity verification Tutorial) (link) has a detailed description of the authorization.
Api
The API class provides an interface for Twitter's so rest API approach (the API class provides access to the entire Twitter RESTful API methods.) Each method accepts different parameters, but returns re Sponse. See API Reference (link) For more information
Model
When we use an API approach, most of the time we get an instance of the Tweepy model class that contains the data returned from Twitter that we can apply to the app. For example, the following line of code returns a user model:
[Python]View PlainCopy
- # Get The User object for Twitter ...
- user = Api.get_user (' Twitter ')
The model contains data and some useful methods:
[Python]View PlainCopy
- Print User.screen_name
- Print User.followers_count
- For friend in User.friends ():
- Print Friend.screen_name
For more information, see Modelsreference (link)
Original link:
Http://tweepy.readthedocs.io/en/v3.5.0/getting_started.html
tweepy1--Crawl Twitter data