First of all, pandas's author is the author of this book.
For NumPy, the object we are dealing with is the matrix
Pandas is encapsulated based on the NumPy, pandas is a two-dimensional table (tabular, spreadsheet-like), and the difference between the matrix is that the two-dimensional table is a meta-data
Using these meta-data as index is more convenient, and numpy only the shape of the index, but the essence is the same, so most operations are common
We encountered the most two-dimensional table application, the table in the relational database, there are column names and line numbers, these are the meta-data
Of course you can use abstract matrices to do statistics on these two-dimensional tables, but using pandas is more convenient.
Introduction to PANDAS Data structures
Series
A Series is a one-dimensional Array-like object containing an array of the data (of any NumPy data type) and an associated arr Ay of data labels, called its index.
A simple understanding is a dictionary, or a one-dimensional table; When index is not explicitly specified, an integer of 0 through N-1 is automatically added as index
Here you can simply replace index, generate a new series,
People think, for NumPy, not explicitly specify index, but also can be through the shape of the index to the data, where the index is essentially the same as the numpy of the Shaping index
So for the numpy operation, the same applies to pandas
At the same time, it said that series is actually a dictionary, so you can also use a Python dictionary to initialize
DataFrame
A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of WHI CH can be a different value type (numeric, String, Boolean, etc).
If the contact with R, should be familiar with dataframe, in fact, pandas to some extent to simulate some of the functions of R
So if you can do statistics in Python as easily as r, then you need to use R again.
The series above is a dictionary or a one-dimensional table,
Dataframe is a two-dimensional table and can also be seen as a dictionary of series
A column name is specified, and the row name is automatically generated
You can also specify the row name, where the debt column is added, but there is no data, so it is Nan
Can be debt, assign a value
Take the line, with IX
You can also use nested dictionaries to create dataframe, which are actually series dictionaries, which are dictionaries themselves, so they are nested dictionaries.
Can be like a numpy matrix, transpose
Essential functionality
Here's a look at what the pandas provides for the convenience of these data structures functions
Reindexing
A critical method on pandas objects was reindex, which means to create a new object with the data conformed to a new index.
It's actually a change indexing.
Add e, and by default fill in 0
You can also specify the Fill method by using the method parameter
You can choose to fill forward or backward
For a two-dimensional table, you can simultaneously reindex on index and columns
The parameters of the Reindex,
Dropping entries from an axis
Specify the dimension with axis, for a two-dimensional table, the row is 0, the column is 1
Indexing, selection, and filtering
Almost as basic as NumPy.
Arithmetic and data alignment
Data alignment and auto-fill are pandas more convenient
In [136]: DF1 = DataFrame (Np.arange (12.). Reshape ((3, 4)), Columns=list (' ABCD '))
In [137]: DF2 = DataFrame (Np.arange (20.). Reshape ((4, 5)), Columns=list (' ABCDE '))
You can see that, by default, only two DF are added, otherwise Nan
I think most of the situation should be to want to have a plus one, that is, to initialize the 0
In addition to add, it supports
summarizing and Computing descriptive Statistics
Provides a number of statistical functions like R,
It is convenient to provide descirbe similar to R.
For non-numeric type, perform describe
Summary table,
Correlation and covariance, correlation coefficients and covariance
Correlation coefficients and covariance between MSFT and IBM
The correlation coefficient matrix and covariance matrix can also be obtained.
Unique Values, Value Counts, and Membership
In [217]: obj = Series ([' C ', ' a ', ' d ', ' a ', ' a ', ' B ', ' B ', ' C ', ' C '])
In [218]: Uniques = Obj.unique ()
In [219]: uniques
OUT[219]: Array ([C, A, D, b], dtype=object)
In [up]: Obj.value_counts ()
OUT[220]:
C 3
A 3
B 2
D 1
Handling Missing Data
Provides some tool functions for handling missing data
Where Fillna is more complicated,
Hierarchical Indexing
Hierarchical indexing is a important feature of pandas enabling you to have multiple (both or more) index levels in an Axi S. Somewhat abstractly, it provides a-on-a-a-to-work with higher dimensional data in a lower dimensional form.
You can use a multi-tiered index, which is essentially equivalent to adding one dimension, so it is equivalent to using low dimensions to simulate high-dimensional data
and is supported, by Unstack and stack to restore the multidimensional data