tweepy1--Crawl Twitter data

Source: Internet
Author: User
Tags oauth

I have always wanted to use crawlers to land and crawl Twitter data, tried scrapy,requests and other packages, are not successful, may be I am not familiar with the reasons, but

A new package, Tweepy, was discovered today, dedicated to handling the Twitter API in Python. Try the first example of the tutorial first, after a bit of your own modification

The code is as follows:

Tweepy Crawl Twitter Data 1    import re  import tweepy    auth = tweepy. Oauthhandler ("xxxxx",                             "xxxxx")  Auth.set_access_token ("xxxxx",                        "xxxxx")    API = Tweepy. API (auth)      highpoints = re.compile (U ' [\ud800-\udbff][\udc00-\udfff] ')  public_tweets = Api.home_timeline ()  num = 0  for tweets in public_tweets:      print num      num + = 1      text_noem = highpoints.sub ('--emoji--', t Weet.text)      text_noem = Text_noem.encode (' UTF8 ')          

Code Explanation:

3–4 Line: Import the Tweepy and re modules. The reason why this simple code to use re is because in the process of extracting tweets encountered emoji expression, and emoji Unicode can not be encoded into GBK, so use regular expressions to replace all expressions.

第6-9: Set the API and token, this need to register after apps.twitter.com new application after the acquisition.

Line 11th: Returns the API object according to Auth for specific return responses

Line 14th: Set the regular expression of emoji expression, used to filter out all the expressions, here refer to a StackOverflow article as noted below.

Line 15th: Get information on the user's timeline

Line 16th: Set a variable for a count

Line 17th: Traverse all tweets:

Inside the Loop:

第18-22 line: Output sequence number, and output tweet content, all the emoji Unicode with '--emoji--' substitution and Unicode encoding as UTF8 to solve the problem can not output.



The point of crawling Twitter data is that Twitter requires all requets to be OAuth certified, and Tweepy This package makes authentication easy.



Reference documents:

Http://stackoverflow.com/questions/13729638/how-can-i-filter-emoji-characters-from-my-input-so-i-can-save-in-mysql-5-5

tweepy 3.5.0 Doc (1) Getting startedStart Introduction

If you are in contact with Tweepy for the first time, please start here. The goal of this tutorial is to provide you with the information you need to learn tweepy, so that you will be proficient in using tweepy after completing this tutorial. We're mainly talking about important basics, not too much detail,


Hello, tweepy. [Python]View PlainCopy
    1. Import Tweepy
    2. Auth = tweepy. Oauthhandler (Consumer_key, Consumer_secret)
    3. Auth.set_access_token (Access_token, Access_token_secret)
    4. API = Tweepy. API (auth)
    5. Public_tweets = Api.home_timeline ()
    6. For tweets in public_tweets:
    7. Print Tweet.text

This example downloads the tweets on your Twitter homepage and prints the corresponding text to the console. Twitter requires all requests (requests) to be authorized via the OAuth protocol (identity authentication). Authentication Tutorial (Identity verification Tutorial) (link) has a detailed description of the authorization.


Api

The API class provides an interface for Twitter's so rest API approach (the API class provides access to the entire Twitter RESTful API methods.) Each method accepts different parameters, but returns re Sponse. See API Reference (link) For more information


Model

When we use an API approach, most of the time we get an instance of the Tweepy model class that contains the data returned from Twitter that we can apply to the app. For example, the following line of code returns a user model:

[Python]View PlainCopy
    1. # Get The User object for Twitter ...
    2. user = Api.get_user (' Twitter ')


The model contains data and some useful methods:

[Python]View PlainCopy
    1. Print User.screen_name
    2. Print User.followers_count
    3. For friend in User.friends ():
    4. Print Friend.screen_name

For more information, see Modelsreference (link)



Original link:

Http://tweepy.readthedocs.io/en/v3.5.0/getting_started.html

tweepy1--Crawl Twitter data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.