Pandas Getting Started

Source: Internet
Author: User
Tags new set string methods

"Original" 10 minutes to fix pandas

This article is a simple translation of "Ten Minutes to Pandas" on the official website of Pandas, the original is here. This article is a simple introduction to pandas, detailed introduction please refer to:Cookbook . As a rule, we will introduce the required packages in the following format:

First, create the object

You can view detailed information about the contents of this section through the Data Structure Intro setion.

1, you can create a series,pandas by passing a list object to create an integer index by default:

2. Create a dataframe by passing a numpy array, a time index, and a column label:

3. Create a dataframe by passing a Dictionary object that can be converted to a similar sequence structure:

4. View data types for different columns:

5. If you are using Ipython, the tab auto-completion feature automatically recognizes all attributes and custom columns, which are a subset of all attributes that can be automatically recognized:

Second, view the data

For more information, see: Basics section

1. Look at the head and tail lines in the frame:

2. Display indexes, columns, and underlying numpy data:

3. The describe () function is a quick statistical summary of the data:

4. Transpose the data:

5, by axis to sort

6. Sort by value

Third, the choice

While the standard python/numpy selection and setup expressions can come in handy, we recommend using optimized pandas data access as the code used for the project:. At,. IAT,. Loc,. Iloc and. IX For details see indexing and selecing Data and multiindex/advanced indexing.

L Get

1. Select a separate column, which will return a series equivalent to DF. A:

2. Select by [], which will slice the rows

L Select by label

1. Use tags to get a cross-section

2, through the label to select on multiple axes

3. Label slicing

4. Dimension reduction for returned objects

5. Get a scalar

6. Fast access to a scalar (equivalent to the previous method)

L Select by location

1. Position selection by passing values (row selected)

2. Slicing by numerical value, similar to the situation in Numpy/python

3, by specifying a list of locations, similar to the situation in Numpy/python

4. Slicing the rows

5. Slicing the columns

6. Get a specific value

L Boolean Index

1. Use a single column value to select the data:

2. Use the where operation to select the data:

3, using the Isin () method to filter:

L Set

1. Set a new column:

2. Set the new value by tag:

3. Set a new value by location:

4. Set a new set of values through a numpy array:

The results of the above operations are as follows:

5. Set the new value by the Where operation:

Iv. processing of missing values

In pandas, use Np.nan instead of missing values, which will not be included in the calculation by default, see Missing Data section for details.

1. The Reindex () method can change/increment/delete an index on a specified axis, which returns a copy of the original data:

2. Remove the rows that contain missing values:

3. Fill in the missing values:

4. Boolean Padding of data:

V. Related Operations

For details, please participate in Basic section on Binary Ops

L statistics (usually excludes missing values for related operations)

1, the implementation of descriptive statistics:

2. Perform the same operation on the other axes:

3, for the different dimensions, you need to align the object to operate. The pandas automatically broadcasts along the specified dimensions:

L Apply

1. Apply the function to the data:

L Histogram

For details, please refer to:histogramming and Discretization

L String method

The series object is equipped with a set of string processing methods in its STR attribute that can be easily applied to each element in the array, as shown in the following snippet. For more information, please refer to:vectorized String Methods.

Vi. merger

Pandas provides a number of ways to easily combine Series,dataframe and panel objects with various logical relationships. For details, see:Merging section

L Concat

L join is similar to a merge of SQL types, see:Database Style joining

L Append connect a row to a dataframe, see appending:

Vii. Grouping

For a group by operation, we usually refer to one or more of the following procedures:

L (splitting) divide the data into different groups according to some rules;

L (applying) executes a function for each set of data;

L (combining) combines the results into a data structure;

For more information, see:Grouping section

1. Group and execute the SUM function for each grouping:

2. Group by multiple columns to form a hierarchical index, and then execute the function:

Eight, reshaping

For details, see Hierarchical Indexing and reshaping.

L Stack

l Pivot table, for more information, seepivot Tables.

You can easily generate a PivotTable report from this data:

Nine, Time series

The pandas has simple, powerful, and efficient functionality when resampling a frequency conversion (such as converting data in seconds to data sampled in 5-minute units). This kind of operation is very common in the financial field. Specific reference: TimeSeriessection.

1. Time zone indicates:

2. Time Zone conversion:

3. Time span Conversion:

4. Conversion between time and timestamp makes it possible to use some handy arithmetic functions.

Ten, categorical

Starting with version 0.15, pandas can support categorical type of data in dataframe, in detail see:categorical Introduction and API documentation .

1. Convert the original grade to the categorical data type:

2. Rename the categorical type data to a more meaningful name:

3. Reorder the categories to add missing categories:

4. Sorting is performed in categorical order rather than in dictionary order:

5. There is an empty category when sorting categorical columns:

Xi. Drawing

For specific documentation see:plotting docs

For Dataframe, plot is an easy way to draw all the columns and their labels:

12. Import and save data

L CSV, reference:Writing to a CSV file

1. Write to CSV file:

2. Read from the CSV file:

L HDF5, reference:hdfstores

1. Write HDF5 storage:

2. Read from HDF5 storage:

L Excel, Reference:MS Excel

1. Write to Excel file:

2. Read from the Excel file:

Pandas Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.