Pandas Simple Introduction (iii)

Last Update:2016-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section mainly introduces the data structure of pandas, this article refers to the URL: https://www.dataquest.io/mission/146/pandas-internals-series

The data that is used in this article is from: Https://github.com/fivethirtyeight/data/tree/master/fandango

This data mainly describes some of the film's rotten tomato scoring situation

Data

There are three major data structures in pandas:

Series (a collection of values)
DataFrame (collection of series)
Panel (collection of Dataframe)

The Pandas series is an upgraded version of an array of NumPy, NumPy can only be indexed using integers, but the series is also indexed using strings, and can be used to represent missing values using mixed data types and Nan. A Series object can contain the following data types:

Float--Represents a string value
INT--Represents an integer value
BOOL--Represents a Boolean value
Datetime64[ns]--Indicates the date and time (without time zone)
Datetime64[ns, TZ]--Indicates the date and time (sometimes the area)
Timedelta[ns]--representing time in different formats (minutes, seconds, etc.)
Category--represents the classification value
Object--Represents a string value

Dataframe uses a Series object to represent the data for each column, so when a column is selected from a dataframe, PANDAS returns the series object that represents the column, and the row of the series is indexed starting at 0, but you can also use shards to select multiple rows

# Select Film and RottenTomatoes two columns respectively and output the first 5 rows  = pd.read_csv ('fandango_score_comparison.csv'= fandango[' ) FILM ' ]print(series_film.head (5= fandango['rottentomatoes  ']print(Series_rt[:5])

Output:

The original data is as follows:

Custom Indexes

The above two series,series_film represent the name of the film, Series_rt represents the score, and I now want to know the score of these two films (Minions), Leviathan (2014), the simplest way to do that

Print(fandango[fandango['FILM']=='Minions ()']['RottenTomatoes'].values[0])Print(fandango[fandango['FILM']=='Leviathan (All)']['RottenTomatoes'].values[0])#It's a lot of trouble to write a statement to every movie.# The best way is to combine series_film and Series_rt into a new series, with the movie name as the index and the movie score as the value, so it's convenient to query multiple movies film_names=Series_film.valuesrt_scores=Series_rt.valuesseries_custom= Series (Rt_scores, Index=film_names)#to create a series, you need to specify the data and index parameters

# it's easier to query multiple movies at this point . series_custom[['Minions'Leviathan' ==series_custom.sort_values  ()

Vectorization operations

When you want to manipulate data from a column in a dataset, the series object can quickly vectorize (automatically compute every data value in that column), Pandas uses numpy, and NumPy uses the C language to cycle through the values of an entire column, so it will fly quickly. If you deliberately use a for to loop through a series object, it will actually become very slow.

Examples of vectorization operations

# perform a subtraction operation on a series series_custom/10#  This statement actually divides each value in the Series_custom series, noting that the index does not operate on the #  You can also use the NumPy function to perform arithmetic  # to find the maximum value of a movie score

You can also compare and filter

# Returns a list that contains a Boolean value that is greater than 50 and returns true, which can be used to filter the data  = series_custom[series_custom >]#  can also use & (and) and | ( OR) connect several judgments series_greater_than_50_&_less_than_80 =     >)  & (Series_custom < 80)]

Of course, you can also perform a direct operation on two series

Rt_critics = Series (fandango['rottentomatoes'].values, index=fandango['  FILM'#  Critic's rating  = Series (fandango['rottentomatoes_user ']. Values, index=fandango['FILM'# user ratings  #  Average score

Pandas easy to get started (iii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pandas Simple Introduction (iii)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Pandas Simple Introduction (iii)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support