Pandas Quick Start (3) and pandas Quick Start

Source: Internet
Author: User

Pandas Quick Start (3) and pandas Quick Start

This section mainly introduces the Pandas data structure, this article cited URL: https://www.dataquest.io/mission/146/pandas-internals-series

The data used in this article comes from: https://github.com/fivethirtyeight/data/tree/master/fandango

This data mainly describes the ratings of rotten tomatoes in some movies.

Data Structure

Pandas has three important data structures:

  • Series (set of values)
  • DataFrame (Set of Series)
  • Panel (Set of DataFrame)

 

Pandas Series is an upgraded version of the Numpy array (array). Numpy can only be indexed by integers, but Series can also be indexed by strings, you can also use mixed data types and NaN to indicate missing values. A Series object can contain the following data types:

  • Float -- string value
  • Int -- integer value
  • Bool -- Boolean Value
  • Datetime64 [ns] -- indicates the date and time (without the time zone)
  • Datetime64 [ns, tz] -- indicates the date and time (with time zone)
  • Timedelta [ns] -- time in different formats (minutes, seconds, etc.)
  • Category -- indicates the category value.
  • Object -- string value

 

DataFrame uses a Series object to represent the data of each column. Therefore, when you select a Time column from a DataFrame, Pandas returns a Series object representing the column, and index the rows of the Series from 0. Of course, you can also use shards to select multiple rows.

# Select the FILM and RottenTomatoes columns respectively, and output the first five rows fandango = done') series_film = fandango ['film'] print (series_film.head (5 )) series_rt = fandango ['rottentomates'] print (series_rt [: 5])

Output:

Print (fandango [fandango ['film'] = 'minions (2015) '] ['fig]. values [0]) print (fandango [fandango ['film'] = 'levioes (2014) '] ['rottentomates']. values [0]) # It is very troublesome to write a statement for each movie. # The best way is to combine series_film and series_rt into a new Series, using movie names as indexes and movie scores as values makes it easier to query multiple movies. film_names = series_film.valuesrt_scores = series_rt.valuesseries_custom = Series (rt_scores, index = film_names) # create a Series, you must specify the data and index parameters.

# In this case, it is easy to query multiple movies. series_custom [['minions (2015) ', 'leviathan (2014)'] # For the Series created above, you can use the sort_index () function to sort the names of movies by letter. If you want to sort the names of movies, use the sort_values () function sc2 = series_custom.sort_index () sc3 = series_custom.sort_values ()

# Perform the addition, subtraction, multiplication, division, and Division operations on a Series. series_custom/10 # This statement is actually a division operation on each value of the Series series_custom. Note, does not perform operations on indexes # You can also use Numpy functions to perform operations on np. max (series_custom) # obtain the maximum score of a movie.

You can also compare and filter

Series_custom> 50 # returns a list containing boolean values. If the score is greater than 50, True is returned. It can be used to filter data series_greater_than_50 = series_custom [series_custom> 50] # It can also be used) and | (or) join several judges series_greater_than_50 _ & _ less_than_80 = \ series_custom [(series_custom> 50) & (series_custom <80)]

Of course, you can perform operations on two Series directly.

Rt_critics = Series (fandango ['rottentomates']. values, index = fandango ['film']) # rating rt_users = Series (fandango ['rottentomatoes _ user']. values, index = fandango ['film']) # user rating rt_mean = (rt_critics + rt_users)/2 # average score

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.