Pandas Quick Start (3) and pandas Quick Start

Last Update:2016-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section mainly introduces the Pandas data structure, this article cited URL: https://www.dataquest.io/mission/146/pandas-internals-series

The data used in this article comes from: https://github.com/fivethirtyeight/data/tree/master/fandango

This data mainly describes the ratings of rotten tomatoes in some movies.

Data Structure

Pandas has three important data structures:

Series (set of values)
DataFrame (Set of Series)
Panel (Set of DataFrame)

Pandas Series is an upgraded version of the Numpy array (array). Numpy can only be indexed by integers, but Series can also be indexed by strings, you can also use mixed data types and NaN to indicate missing values. A Series object can contain the following data types:

Float -- string value
Int -- integer value
Bool -- Boolean Value
Datetime64 [ns] -- indicates the date and time (without the time zone)
Datetime64 [ns, tz] -- indicates the date and time (with time zone)
Timedelta [ns] -- time in different formats (minutes, seconds, etc.)
Category -- indicates the category value.
Object -- string value

DataFrame uses a Series object to represent the data of each column. Therefore, when you select a Time column from a DataFrame, Pandas returns a Series object representing the column, and index the rows of the Series from 0. Of course, you can also use shards to select multiple rows.

# Select the FILM and RottenTomatoes columns respectively, and output the first five rows fandango = done') series_film = fandango ['film'] print (series_film.head (5 )) series_rt = fandango ['rottentomates'] print (series_rt [: 5])

Output:

Print (fandango [fandango ['film'] = 'minions (2015) '] ['fig]. values [0]) print (fandango [fandango ['film'] = 'levioes (2014) '] ['rottentomates']. values [0]) # It is very troublesome to write a statement for each movie. # The best way is to combine series_film and series_rt into a new Series, using movie names as indexes and movie scores as values makes it easier to query multiple movies. film_names = series_film.valuesrt_scores = series_rt.valuesseries_custom = Series (rt_scores, index = film_names) # create a Series, you must specify the data and index parameters.

# In this case, it is easy to query multiple movies. series_custom [['minions (2015) ', 'leviathan (2014)'] # For the Series created above, you can use the sort_index () function to sort the names of movies by letter. If you want to sort the names of movies, use the sort_values () function sc2 = series_custom.sort_index () sc3 = series_custom.sort_values ()

# Perform the addition, subtraction, multiplication, division, and Division operations on a Series. series_custom/10 # This statement is actually a division operation on each value of the Series series_custom. Note, does not perform operations on indexes # You can also use Numpy functions to perform operations on np. max (series_custom) # obtain the maximum score of a movie.

You can also compare and filter

Series_custom> 50 # returns a list containing boolean values. If the score is greater than 50, True is returned. It can be used to filter data series_greater_than_50 = series_custom [series_custom> 50] # It can also be used) and | (or) join several judges series_greater_than_50 _ & _ less_than_80 = \ series_custom [(series_custom> 50) & (series_custom <80)]

Of course, you can perform operations on two Series directly.

Rt_critics = Series (fandango ['rottentomates']. values, index = fandango ['film']) # rating rt_users = Series (fandango ['rottentomatoes _ user']. values, index = fandango ['film']) # user rating rt_mean = (rt_critics + rt_users)/2 # average score

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pandas Quick Start (3) and pandas Quick Start

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Pandas Quick Start (3) and pandas Quick Start

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support