Pandas: data Analysis Library built on NumPy
PANDAS data structure: Series, DataFrame
Series: class one-dimensional array objects with data labels (also considered as dictionaries)
Values, index
Missing data detection: Pd.isnull (), Pd.notnull (), instance method for series objects
The series object itself and its index have a Name property, which is closely related to pandas other key functions
DataFrame: Tabular data structures, columns and rows are indexed
Get dataframe column: How to tag a dictionary, or how to attribute it (frame2[' state ']/frame2.state)
Get Dataframe Line: IX () method
Columns returned by index are just the corresponding data views, not replicas, and the copy method of the series can be displayed to copy columns
Dataframe's index and column also have the Name property, which can be set by itself
Indexed objects:Pandas the Index object is responsible for managing axis labels and other metadata, and when building a series or dataframe, any array or other sequence of tags used will be converted to an index. The Index object is not modifiable (immutable).
Index Property
Basic functions
re-index: Create an object suitable for the new index Reindex ()
Specify Drop object: Drop ()
Index selection and Filtering: IX ()
Arithmetic operations and data alignment
Pandas can perform arithmetic operations on different indexed objects and automatically populate Na with non-overlapping values
padding values in arithmetic methods:fill_value
operations between Dataframe and series:broadcast ()
By default, the arithmetic operations between Dataframe and series match the index of the series to the Dataframe column and then propagate down the line, and if you want to match rows and broadcast on a column, you must use the arithmetic operation method
function Application and Mapping
NumPy Ufuncs (Element progression group method), which can also be used to manipulate pandas objects
The Apply () method of Dataframe, which can apply a function to a one-dimensional array formed by a row or column
Sort and rank
Sort:
Sort_index () sort the index of the row or column (in dictionary order)
Sort_index (by =) sort by values in one or more columns
The series is sorted by value, and the order method
Ranking:
Rank ()
Axis index with duplicate values
The Is_unique () property of the index can tell you if its value is unique
Summary and calculation of descriptive statistics
SUM ()
Mean ()
Describe ()
Describing and summarizing statistical functions
correlation coefficients and covariance
The series and Dataframe methods are computed for the parameter pairs.
Unique value, value count, and membership
Unique value: Unique () method
Value count: The Value_counts () method calculates how often each value in a series appears
Membership: Isin, which is used to determine the membership of a vectorization set, you can select a subset of the data in a series or dataframe column
Processing missing data
Filtering Missing data: Dropna
For Dataframe objects, Dropna discards any rows that contain missing values by default; Dropna (how = ' all ') discards all the rows that are NA.
If it is for a column, passing in axis = 1 will
Fill missing data: Fillna
Incoming constant value: All Na is replaced with a constant value
Incoming dictionaries: Different columns are populated with different values
New objects are returned by default, but can also be modified in place inplace = TRUE
Hierarchical indexes: data reshaping and grouping-based operations (pivot tables)
Stack and Unstack
For Dataframe, each axis can have a hierarchical index.
Summarize by Level: The description and summary statistics for the Dataframe and series are all with A Levels option.
Use column as row index to change row index to dataframe column: Set_index () opposite Reset_index ()
[Reading notes] Python data Analysis (v) Pandas getting Started