[Reading notes] Python data Analysis (v) Pandas getting Started

Last Update:2017-12-09 Source: Internet

Author: User

Tags arithmetic instance method

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pandas: data Analysis Library built on NumPy

PANDAS data structure: Series, DataFrame

Series: class one-dimensional array objects with data labels (also considered as dictionaries)

Values, index

Missing data detection: Pd.isnull (), Pd.notnull (), instance method for series objects

The series object itself and its index have a Name property, which is closely related to pandas other key functions

DataFrame: Tabular data structures, columns and rows are indexed

Get dataframe column: How to tag a dictionary, or how to attribute it (frame2[' state ']/frame2.state)

Get Dataframe Line: IX () method

Columns returned by index are just the corresponding data views, not replicas, and the copy method of the series can be displayed to copy columns

Dataframe's index and column also have the Name property, which can be set by itself

Indexed objects:Pandas the Index object is responsible for managing axis labels and other metadata, and when building a series or dataframe, any array or other sequence of tags used will be converted to an index. The Index object is not modifiable (immutable).

Index Property

Basic functions

re-index: Create an object suitable for the new index Reindex ()

Specify Drop object: Drop ()

Index selection and Filtering: IX ()

Arithmetic operations and data alignment

Pandas can perform arithmetic operations on different indexed objects and automatically populate Na with non-overlapping values

padding values in arithmetic methods:fill_value

operations between Dataframe and series:broadcast ()

By default, the arithmetic operations between Dataframe and series match the index of the series to the Dataframe column and then propagate down the line, and if you want to match rows and broadcast on a column, you must use the arithmetic operation method

function Application and Mapping

NumPy Ufuncs (Element progression group method), which can also be used to manipulate pandas objects

The Apply () method of Dataframe, which can apply a function to a one-dimensional array formed by a row or column

Sort and rank

Sort:

Sort_index () sort the index of the row or column (in dictionary order)

Sort_index (by =) sort by values in one or more columns

The series is sorted by value, and the order method

Ranking:

Rank ()

Axis index with duplicate values

The Is_unique () property of the index can tell you if its value is unique

Summary and calculation of descriptive statistics

SUM ()

Mean ()

Describe ()

Describing and summarizing statistical functions

correlation coefficients and covariance

The series and Dataframe methods are computed for the parameter pairs.

Unique value, value count, and membership

Unique value: Unique () method

Value count: The Value_counts () method calculates how often each value in a series appears

Membership: Isin, which is used to determine the membership of a vectorization set, you can select a subset of the data in a series or dataframe column

Processing missing data

Filtering Missing data: Dropna

For Dataframe objects, Dropna discards any rows that contain missing values by default; Dropna (how = ' all ') discards all the rows that are NA.

If it is for a column, passing in axis = 1 will

Fill missing data: Fillna

Incoming constant value: All Na is replaced with a constant value

Incoming dictionaries: Different columns are populated with different values

New objects are returned by default, but can also be modified in place inplace = TRUE

Hierarchical indexes: data reshaping and grouping-based operations (pivot tables)

Stack and Unstack

For Dataframe, each axis can have a hierarchical index.

Summarize by Level: The description and summary statistics for the Dataframe and series are all with A Levels option.

Use column as row index to change row index to dataframe column: Set_index () opposite Reset_index ()

[Reading notes] Python data Analysis (v) Pandas getting Started

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More