Your the Mining Project with Python in 3 Steps__python

Source: Internet
Author: User
Tags auth install pandas install matplotlib

Every Day, we generate huge amounts of text online, creating vast quantities of data about what was happening in the Wo Rld and what people. All of this text the data is a invaluable resource that can are mined in order to generate meaningful business insights Alysts and organizations. However, analyzing all of this content isn ' t easy, since converting text produced from people into structured information to Analyze with a machine is a complex task. In recent years though, Natural Language processing and Text Mining has a become to more lot for data accessible, Analysts, and developers alike.

There is a massive amount of resources, code libraries, services, and APIs out There which can all help you embark on your The A-NLP project. For this how-to post, we thought our put together a three-step, end-to-end guide to your-i-Introductory to NLP project. We'll start from scratch by showing you to build a corpus of language data and I to analyze this text, and then we ' l L finish by visualizing the results.

We ' ve split this post into 3 steps. Each of these steps'll do two things:show a core task that'll get you familiar with NLP basics, and also introduce yo U to some common APIs and code libraries for each of the tasks. The tasks we ' ve selected are:building a corpus-using tweepy to gather sample text data from Twitter ' s API. Analyzing text-analyzing The sentiment of a piece of text with our own SDK. Visualizing results-how to use Pandas and matplotlib to the results of your work.

Please note:this guide are aimed at developers who are new to NLP and anyone with a basic knowledge of the How to run a script In Python. If you don ' t want to write code, take a look at the blog posts we ' ve put together in how-to use our RapidMiner extension o R we Google Sheets Add-on to analyze text.

Step 1. Build a Corpus

Can build your corpus from Anywhere-maybe for you have a large collection of emails you want to analyze, a collection of Customer feedback in NPS surveys this you want to dive into, or maybe your want to focus on the voice of your customers on Line. There are lots of options open to and but for the purpose of this post we ' re going to use Twitter as our focus for Buildi Ng a corpus. Twitter is a very useful source of textual content:it ' s easily accessible, it's public, and it offers the insight into a h Uge volume of text that is contains public opinion.

Accessing the Twitter Search API using the Python is pretty easy. There are lots of libraries available, but we favourite option is tweepy. In this step, we ' re going to the ' tweepy ' the Twitter API for the most recent Tweets that contain our search T Erm, and then we'll write the Tweets to a text file, with each Tweet on its own line. This'll make it easy for us to analyze each Tweet separately in the next step.

You can install tweepy using PIP:

Pip Install Tweepy

Once completed, open a Python shell to double-check, it ' s been installed correctly:

>>> Import Tweepy

The permission from Twitter to gather the "the Search API, so" you need to sign up as a developer To get your consumer keys and access tokens, which should take you three or four. Next, you are need to build your search query by adding your search term to the q = ' field. You'll also need to add some further parameters like the language, the amount of results for you want returned, and the time Period to search in. You can get very specific about what your want to search for on Twitter; To make a more complicated query, take a look at the list of operators and can use the APIs to search with the search AP I Introduction.

Fill your credentials and your query into this script:

 # # import the Libraries import tweepy, codecs # Fill in your Twitter credentials Consumer_key = ' Your consumer ke y here ' Consumer_secret = ' Your consumer secret key here ' Access_token = ' Your access token here ' Access_token_secret = ' Y Our access token secret here ' # let Tweepy set up a instance of the REST API auth = tweepy. Oauthhandler (Consumer_key, Consumer_secret) Auth.set_access_token (Access_token, Access_token_secret) API = Tweepy. API (auth) # # Fill in your search query and store your results in a variable results = api.search (q = ' Your search term he Re ", lang =" en ", result_type =" recent ", Count = 1000) # # Use the codecs library to write the text of the Twitter to a. T
	XT File File = Codecs.open ("Your text file name Here.txt", "W", "Utf-8") for result in Results:file.write (Result.text) File.write ("\ n") File.close () 

You can the script so we are writing result.text to a. txt file and not simply the result, which is what the API is returning to us. APIs that return language data from social media or online journalism sites usually return lots of metadata R results. To does this, the they format their output in JSON, which are easy for machines to read.

For example, in the script above, every ' result ' is it own JSON object, with ' text ' being just one field-the one that C Ontains the Tweet text. The other fields in the JSON file contain metadata like the location or timestamp of the tweets, which you can extract for a mo Re detailed analysis.

To access the rest of the metadata, we'll need to write to a JSON file, but for this project we ' re just going to analyze th E-text of people ' s Tweets. So in this case, a. txt file is fine, and our script would just forget the rest of the metadata once it finishes. If you are want to take a look in the full JSON results, print everything the API returns to your instead:

This is also why we used codecs module, to avoid any formatting issues the script reads the JSON results and writes u Tf-8 text.

Step 2. Analyze sentiment

So once we ' ve collected the text of the Tweets this you want to analyze, we can use more advanced NLP tools to start extra Cting information from it. Sentiment analysis are a great example of this, since it tells us whether people were expressing positive, negative, or neu Tral sentiment in the text that we have.

For the sentiment analysis, we ' re going to use our own Aylien Text API. Just like and the Twitter Search API, you'll need to sign up for the "free" to grab your API key (don ' t worry-free m EANs free permanently. There ' s no credit card required, and we don ' t harass for you with promotional stuff!). This is the gives you 1,000 calls to the API/month free of charge.

Again, you can install using PIP:

Pip Install Aylien-apiclient

Then Make sure the SDK has installed correctly from your Python shell:

>>>from aylienapiclient Import Textapi

Once you ' ve got your App key and application IDs, insert them into the code below to get started with your A to th e API from the Python shell (we also have extensive documentation in 7 popular languages). Our API lets your make your the "the" API with just four lines of code:

>>>from aylienapiclient Import textapi
>>>client = (' your_app_id ', ' Your_application_key ')
>>>sentiment = client. Sentiment ({' Text ': ' Enter some of your own text here '})
>>>print (sentiment)

This'll return JSON results to your with metadata, just as we results from the Twitter API.

So we are need to analyze we corpus from step 1. To doing this, we are need to analyze every Tweet separately. The script below uses the IO module to open up a new. csv file and write the column headers "Tweet" and "sentiment", and T Hen it opens and reads the. txt file containing our Tweets. Then, for each Tweet in the. txt file it sends the text to the Aylien API, extracts the sentiment prediction from the JSON That's the Aylien API returns, and writes this to the. csv file beside the Tweet itself.

This would give us a. csv file with two columns-the text of a tweet and the sentiment of the tweet, as predicted by T He aylien API. We can look through this file to verify the results, and also visualize my results to "some metrics on" how people felt About whatever our search query is.

 from aylienapiclient import textapi import csv, IO # Initialize A new client of Aylien Text API client = Textapi.c Lient ("your_app_id", "Your_app_key") with Io.open (' Trump_tweets.csv ', ' W ', encoding= ' UTF8 ', newline= ') as Csvfile:csv
	    _writer = Csv.writer (csvfile) csv_writer ("Tweet", "sentiment") with Io.open ("Trump.txt", ' R ', encoding= ' UTF8 ') as F:

	    	For tweets in F.readlines (): # # Remove extra spaces or newlines around the text tweet = Tweet.strip ()
	    		# # Reject tweets which are empty so don ' t waste your API credits if Len (tweet) = = 0:print (' skipped ') Continue print (tweet) # # make call to Aylien Text API sentiment = client. Sentiment ({' text ': tweet}) # # Write The sentiment result into CSV file Csv_writer.writerow ([sentiment[' text '] , sentiment[' polarity ']) 

You might notice on the final line of the script this is the script goes to write the tweets text to the file, we ' re Actually writing the tweet as it is returned by the Aylien API, rather than the tweets from the. txt file. They are both identical pieces of text, but we ' ve chosen to write the ' the ' API just to make sure we ' re reading th e exact text that API analyzed. This is just to make it clearer if we ' ve made a error somehow.

Step 3. Visualize your Results

So far we ' ve used a API to gather-text from Twitter, and used we text analysis API to analyze whether people were, Speaki Ms Ng positively or negatively in their Tweet. At this point, your have a couple of options with what to the results. Can feed this structured information about sentiment to whatever solution you ' re building, which could to be anything F Rom a simple social listening app or a even a automated in the public reaction to a campaign. You are could also use the "data to" build informative visualizations, which are what we ' ll do with the final step.

For this step, we ' re going to-matplotlib to visualize our data and pandas to read the. csv file, two Python libraries That are easy to get up and running. You'll be able to create a visualization from the command line or save it as a. png file.

Install both using PIP:

Pip install matplotlib
pip Install pandas

The script below opens up my. csv file, and then uses pandas to read the column titled "Sentiment". It uses Counter to count how many the times each sentiment appears, and then matplotlib plots the Counter ' s results to a color-cod Ed Pie chart (you'll need to enter your search query to the "yourtext" variable for presentation).

 # # import the Libraries import Matplotlib.pyplot as PLT Import pandas as PD from collections import Counter Import CSV # # Open up your CSV file with the sentiment results with open (' Your_csv_file_from_step_3 ', ' r ', encoding = ' UTF8 ') a S csvfile: # # Use Pandas to read the "sentiment" column, df = pd.read_csv (csvfile) sent = df["sentiment"] # # Use Counte R to count how many times each sentiment appears # and save each as a variable counter = counter (Sent) positive = count er[' positive '] negative = counter[' negative '] neutral = counter[' neutral '] # # Declare the variables for the pie chart, Using the Counter variables for "sizes" labels = ' Positive ', ' Negative ', ' Neutral ' sizes = [Positive, Negative, Neutral] C olors = [' green ', ' red ', ' grey '] yourtext = "Your Search Query from Step 2" # # Use Matplotlib to plot the chart Plt.pie (s izes, labels = labels, colors = colors, shadow = True, startangle = plt.title ("Sentiment of Tweets about" +yourtex T) plt.show () 

If you are want to save your chart to a. png file instead of just showing it, replace Plt.show on the last line with SAVEF IG (' Your chart name.png '). Below is the visualization we ended up with (we searched "Trump" in step 1).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.