Python for Data analysis--Pandas

Last Update:2014-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First of all, pandas's author is the author of this book.
For NumPy, the object we are dealing with is the matrix
Pandas is encapsulated based on the NumPy, pandas is a two-dimensional table (tabular, spreadsheet-like), and the difference between the matrix is that the two-dimensional table is a meta-data
Using these meta-data as index is more convenient, and numpy only the shape of the index, but the essence is the same, so most operations are common

We encountered the most two-dimensional table application, the table in the relational database, there are column names and line numbers, these are the meta-data
Of course you can use abstract matrices to do statistics on these two-dimensional tables, but using pandas is more convenient.

Introduction to PANDAS Data structures

Series

A Series is a one-dimensional Array-like object containing an array of the data (of any NumPy data type) and an associated arr Ay of data labels, called its index.
A simple understanding is a dictionary, or a one-dimensional table; When index is not explicitly specified, an integer of 0 through N-1 is automatically added as index

Here you can simply replace index, generate a new series,

People think, for NumPy, not explicitly specify index, but also can be through the shape of the index to the data, where the index is essentially the same as the numpy of the Shaping index
So for the numpy operation, the same applies to pandas

At the same time, it said that series is actually a dictionary, so you can also use a Python dictionary to initialize

DataFrame

A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of WHI CH can be a different value type (numeric, String, Boolean, etc).

If the contact with R, should be familiar with dataframe, in fact, pandas to some extent to simulate some of the functions of R
So if you can do statistics in Python as easily as r, then you need to use R again.

The series above is a dictionary or a one-dimensional table,
Dataframe is a two-dimensional table and can also be seen as a dictionary of series

A column name is specified, and the row name is automatically generated

You can also specify the row name, where the debt column is added, but there is no data, so it is Nan

Can be debt, assign a value

Take the line, with IX

You can also use nested dictionaries to create dataframe, which are actually series dictionaries, which are dictionaries themselves, so they are nested dictionaries.

Can be like a numpy matrix, transpose

Essential functionality

Here's a look at what the pandas provides for the convenience of these data structures functions

Reindexing

A critical method on pandas objects was reindex, which means to create a new object with the data conformed to a new index.

It's actually a change indexing.

Add e, and by default fill in 0

You can also specify the Fill method by using the method parameter

You can choose to fill forward or backward

For a two-dimensional table, you can simultaneously reindex on index and columns

The parameters of the Reindex,

Dropping entries from an axis

Specify the dimension with axis, for a two-dimensional table, the row is 0, the column is 1

Indexing, selection, and filtering

Almost as basic as NumPy.

Arithmetic and data alignment

Data alignment and auto-fill are pandas more convenient

In [136]: DF1 = DataFrame (Np.arange (12.). Reshape ((3, 4)), Columns=list (' ABCD '))
In [137]: DF2 = DataFrame (Np.arange (20.). Reshape ((4, 5)), Columns=list (' ABCDE '))

You can see that, by default, only two DF are added, otherwise Nan
I think most of the situation should be to want to have a plus one, that is, to initialize the 0

In addition to add, it supports

summarizing and Computing descriptive Statistics

Provides a number of statistical functions like R,

It is convenient to provide descirbe similar to R.

For non-numeric type, perform describe

Summary table,

Correlation and covariance, correlation coefficients and covariance

Correlation coefficients and covariance between MSFT and IBM

The correlation coefficient matrix and covariance matrix can also be obtained.

Unique Values, Value Counts, and Membership

In [217]: obj = Series ([' C ', ' a ', ' d ', ' a ', ' a ', ' B ', ' B ', ' C ', ' C '])

In [218]: Uniques = Obj.unique ()
In [219]: uniques
OUT[219]: Array ([C, A, D, b], dtype=object)

In [up]: Obj.value_counts ()
OUT[220]:
C 3
A 3
B 2
D 1

Handling Missing Data

Provides some tool functions for handling missing data

Where Fillna is more complicated,

Hierarchical Indexing

Hierarchical indexing is a important feature of pandas enabling you to have multiple (both or more) index levels in an Axi S. Somewhat abstractly, it provides a-on-a-a-to-work with higher dimensional data in a lower dimensional form.

You can use a multi-tiered index, which is essentially equivalent to adding one dimension, so it is equivalent to using low dimensions to simulate high-dimensional data

and is supported, by Unstack and stack to restore the multidimensional data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python for Data analysis--Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python for Data analysis--Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support