Turn
The same lesson is reproduced from the great God. The sample code will be incrementally added in the future.
Pandas
Pandas is a numpy-based tool that was created to solve the data analysis task. Pandas incorporates a number of libraries and a number of standard data models, providing the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily.
>>> from pandas import Series, DataFrame
>>> Import Pandas as PD
A.pandas
Function |
Description |
Pd.isnull (series) Pd.notnull (series) |
Determines whether it is empty (NaN) Determines whether it is not empty (not NaN) |
|
|
|
|
2.2.a.1 Pandas common functions
The B.series Series combines the advantages of dictionaries and ndarray with almost all of the index operations and functions of ndarray or dictionaries.
Property |
Description |
Values |
Get array |
Index |
Get index |
Name |
Values ' name |
Index.name |
The name of the index |
2.2.b.1 Series Common Properties
Function |
Description |
Series ([x, y,...]) Series ({' A ': x, ' B ': Y,...}, index=param1) |
Generate a series |
Series.copy () |
Copy a series |
Series.reindex ([x, y,...], Fill_value=nan) Series.reindex ([x, y,...], Method=nan) Series.reindex (Columns=[x,y,...]) |
Return to a new object that adapts to the new index, populating the missing values with Fill_value Returns a new object that adapts to the new index, filled in as method To re-index a column |
Series.drop (Index) |
Discard the specified item |
Series.map (f) |
Apply element-level functions |
|
|
Sort function |
Description |
Series.sort_index (Ascending=true) |
Returns a sorted new object based on the index |
Series.order (Ascending=true) |
Returns a sorted object by value, Nan value at the end |
Series.rank (method= ' average ', ascending=true, axis=0) |
Assign an average rank to each group |
Df.argmax () Df.argmin () |
Returns the index position that contains the maximum value Returns the index position with the minimum value |
2.2.b.2 Series Common functions
Method options for Reindex:
Ffill, Bfill forward padding/back padding
Pad, backfill forward, backward handling
Rank's method option
' Average ' in equal groupings, assigning average rankings for each value
' Max ', ' min ' uses the smallest rank in the entire grouping
' First ' is ranked by value in the order in which it appears in the original data c.dataframe
Dataframe is a tabular data structure that contains a set of ordered columns, each of which can be a different value type (numeric, String, Boolean, and so on). Dataframe has both a row index and a column index, which can be seen as a Dictionary of series (the same index is shared).
Dataframe can get a column as a series by way of a dictionary or. ColumnName. Rows can also be obtained by location or by name.
Assigning a value to a column that does not exist creates a new column.
>>> del frame[' xxx '] # Delete Column
Property |
Description |
Values |
The value of the Dataframe |
Index |
Row index |
Index.name |
The name of the row index |
Columns |
Column index |
Columns.name |
Name of the column index |
Ix |
Returns the Dataframe of a row |
Ix[[x,y,...], [x, y,...]] |
Re-index rows and re-index columns |
T |
Frame row and column transpose |
|
|
2.2.c.1 Dataframe Common Properties
Function |
Description |
DataFrame (Dict, Columns=dict.index, Index=[dict.columnnum]) DataFrame (two-dimensional ndarray) DataFrame (a dictionary consisting of arrays, lists, or tuples) DataFrame (structured/recorded array of numpy) DataFrame (a dictionary made up of series) DataFrame (dictionary made up of dictionaries) DataFrame (List of dictionaries or series) DataFrame (a list of lists or tuples) DataFrame (DataFrame) DataFrame (NumPy's Maskedarray) |
Build Dataframe Data matrix, you can also pass in row and column labels Each sequence becomes a column of Dataframe. All sequences must be of the same length Similar to "dictionaries made up of arrays" Each series becomes a column. If an index is not explicitly indexed, the index of each series is merged into the row index of the result Each inner-layer dictionary becomes a column. The key is merged into the row index of the result. The items will become a dataframe line. Indexed and assembled to be the dataframe of the column. Similar to two-dimensional ndarray Follow dataframe Similar to two-dimensional ndarray, but mask results become na/missing values |
Df.reindex ([x, y,...], Fill_value=nan, limit) Df.reindex ([x, y,...], Method=nan) Df.reindex ([x, y,...], columns=[x,y,...],copy=true) |
Returns a new object that adapts to the new index, fills the missing value to Fill_value, and the maximum padding is limit Returns a new object that adapts to the new index, filled in as method The rows and columns are also re-indexed, and the new objects are copied by default. |
Df.drop (index, axis=0) |
Discards the specified item on the specified axis. |
|
|
Sort function |
Description |
Df.sort_index (axis=0, Ascending=true) Df.sort_index (by=[a,b,...]) |
Sort by index |
|
|
Summary statistics function |
Description |
Df.count () |
Number of non-Nan |
Df.describe () |
Generate multiple summary statistics at once |
Df.min () Df.min () |
Minimum value Maximum Value |
Df.idxmax (axis=0, Skipna=true) Df.idxmin (axis=0, Skipna=true) |
Returns the series of index with the maximum value Returns the series with the lowest value index |
Df.quantile (axis=0) |
Calculate the number of sub-digits of a sample |
Df.sum (axis=0, Skipna=true, Level=nan) Df.mean (axis=0, Skipna=true, Level=nan) Df.median (axis=0, Skipna=true, Level=nan) Df.mad (axis=0, Skipna=true, Level=nan) Df.var (axis=0, Skipna=true, Level=nan) DF.STD (axis=0, Skipna=true, Level=nan) Df.skew (axis=0, Skipna=true, Level=nan) Df.kurt (axis=0, Skipna=true, Level=nan) Df.cumsum (axis=0, Skipna=true, Level=nan) Df.cummin (axis=0, Skipna=true, Level=nan) Df.cummax (axis=0, Skipna=true, Level=nan) Df.cumprod (axis=0, Skipna=true, Level=nan) Df.diff (axis=0) Df.pct_change (axis=0) |
Returns a series containing a sum subtotal Returns a series that contains an average Returns a series containing the median number of arithmetic Returns a series that calculates the average absolute deviation based on the mean Returns a series of variance Returns a series of standard deviations Returns the skewness of the sample value (third-order distance) Returns the kurtosis of the sample value (four-step distance) Returns the cumulative sum of the samples Returns the cumulative maximum value of a sample Returns the cumulative minimum value of a sample Returns the cumulative product of a sample Returns the first-order difference of a sample Change in percent number of returned samples |
|
|
|
|
Calculation function |
Description |
Df.add (DF2, Fill_value=nan, Axist=1) Df.sub (DF2, Fill_value=nan, Axist=1) Df.div (DF2, Fill_value=nan, Axist=1) Df.mul (DF2, Fill_value=nan, Axist=1) |
Element-level addition, no element is found for playhead default Fill_value Element-level subtraction, no element is found for playhead by default Fill_value Element-level division, the Playhead element is not found by default Fill_value Element-level multiplication, no element is found for playhead by default Fill_value |
Df.apply (f, axis=0) |
Apply the F function to a one-dimensional array formed by the columns of each row |
Df.applymap (f) |
Apply an F function to individual elements |
Df.cumsum (axis=0, Skipna=true) |
Accumulate, return the accumulated dataframe |
2.2.C.2 Dataframe Common functions
Index mode |
Description |
Df[val] |
Select a single column or set of columns for Dataframe |
Df.ix[val] |
Select a single row or group of rows for Dataframe |
Df.ix[:,val] |
Select a single column or subset of columns |
DF.IX[VAL1,VAL2] |
Match one or more axes to a new index |
Reindex method |
Match one or more axes to a new index |
xs method |
Select a single line or single column based on the tag to return a series |
Icol, IRow method |
Selects a single or single row based on an integer position and returns a series |
Get_value, Set_value |
Select a single value based on row labels and column labels |
2.2.c.3 Dataframe Common Index method
Operation:
By default, the arithmetic operations between Dataframe and series match the index of the series to the Dataframe column, which propagates down the column. If the index is not found, it will be re-indexed to produce the set.
D.index
The Index object of the pandas is responsible for managing axis labels and other metadata (such as axis names, etc.). When you build a series or dataframe, the tags of any array or other sequence that you use are converted to an index. the Index object cannot be modified to be securely shared among multiple data structures.
The primary Index object |
Description |
Index |
The most extensive Index object that represents an axis label as a numpy array of Python objects |
Int64index |
Special index for integers |
Multiindex |
A hierarchical Index object that represents a multi-level index on a single axis. Can be seen as an array of tuples |
Datetimeindex |
Memory nanosecond timestamp (denoted by NumPy's Datetime64 type) |
Periodindex |
Special index for period data (time interval) |
2.2.d.1 Primary Index Property
function |
Description /strong> |
Index ([x, y,...]) |
CREATE INDEX |
Append (index) |
joins another Index object, resulting in a new index |
diff (Index) |
calculates the difference set, resulting in a new Index |
intersection (Index) |
calculation intersection |
Union (INDEX) |
compute Unions |
Isin (Index) |
checks for presence with the parameter index, returns an array of type bool |
Delete (i) |
Delete element at index I, get new index |
Drop (str) |
to delete the incoming value, get the new index |
Insert (I,STR) |
Inserts an element at index i to get a new index |
is_monotonic () |
returns True when each element is greater than the previous element |
is_unique () |
returns True when index has no duplicate values |
Unique () |
computes an array of unique values in index |
2.2.d.2 commonly used index functions
Advanced 16th Course Python Module pandas