tweepy1--Crawl Twitter data

Last Update:2017-01-13 Source: Internet

Author: User

Tags oauth

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have always wanted to use crawlers to land and crawl Twitter data, tried scrapy,requests and other packages, are not successful, may be I am not familiar with the reasons, but

A new package, Tweepy, was discovered today, dedicated to handling the Twitter API in Python. Try the first example of the tutorial first, after a bit of your own modification

The code is as follows:

Tweepy Crawl Twitter Data 1    import re  import tweepy    auth = tweepy. Oauthhandler ("xxxxx",                             "xxxxx")  Auth.set_access_token ("xxxxx",                        "xxxxx")    API = Tweepy. API (auth)      highpoints = re.compile (U ' [\ud800-\udbff][\udc00-\udfff] ')  public_tweets = Api.home_timeline ()  num = 0  for tweets in public_tweets:      print num      num + = 1      text_noem = highpoints.sub ('--emoji--', t Weet.text)      text_noem = Text_noem.encode (' UTF8 ')

Code Explanation:

3–4 Line: Import the Tweepy and re modules. The reason why this simple code to use re is because in the process of extracting tweets encountered emoji expression, and emoji Unicode can not be encoded into GBK, so use regular expressions to replace all expressions.

第6-9: Set the API and token, this need to register after apps.twitter.com new application after the acquisition.

Line 11th: Returns the API object according to Auth for specific return responses

Line 14th: Set the regular expression of emoji expression, used to filter out all the expressions, here refer to a StackOverflow article as noted below.

Line 15th: Get information on the user's timeline

Line 16th: Set a variable for a count

Line 17th: Traverse all tweets:

Inside the Loop:

第18-22 line: Output sequence number, and output tweet content, all the emoji Unicode with '--emoji--' substitution and Unicode encoding as UTF8 to solve the problem can not output.

The point of crawling Twitter data is that Twitter requires all requets to be OAuth certified, and Tweepy This package makes authentication easy.

Reference documents:

Http://stackoverflow.com/questions/13729638/how-can-i-filter-emoji-characters-from-my-input-so-i-can-save-in-mysql-5-5

tweepy 3.5.0 Doc (1) Getting startedStart Introduction

If you are in contact with Tweepy for the first time, please start here. The goal of this tutorial is to provide you with the information you need to learn tweepy, so that you will be proficient in using tweepy after completing this tutorial. We're mainly talking about important basics, not too much detail,

Hello, tweepy. [Python]View PlainCopy

Import Tweepy
Auth = tweepy. Oauthhandler (Consumer_key, Consumer_secret)
Auth.set_access_token (Access_token, Access_token_secret)
API = Tweepy. API (auth)
Public_tweets = Api.home_timeline ()
For tweets in public_tweets:
Print Tweet.text

This example downloads the tweets on your Twitter homepage and prints the corresponding text to the console. Twitter requires all requests (requests) to be authorized via the OAuth protocol (identity authentication). Authentication Tutorial (Identity verification Tutorial) (link) has a detailed description of the authorization.

Api

The API class provides an interface for Twitter's so rest API approach (the API class provides access to the entire Twitter RESTful API methods.) Each method accepts different parameters, but returns re Sponse. See API Reference (link) For more information

Model

When we use an API approach, most of the time we get an instance of the Tweepy model class that contains the data returned from Twitter that we can apply to the app. For example, the following line of code returns a user model:

[Python]View PlainCopy

# Get The User object for Twitter ...
user = Api.get_user (' Twitter ')

The model contains data and some useful methods:

[Python]View PlainCopy

Print User.screen_name
Print User.followers_count
For friend in User.friends ():
Print Friend.screen_name

For more information, see Modelsreference (link)

Original link:

Http://tweepy.readthedocs.io/en/v3.5.0/getting_started.html

tweepy1--Crawl Twitter data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More