Pandas data structures and indexes are Getting Started Pandas must learn the content, here in detail to explain to you, read this article, I believe you Pandas There is a clear understanding of data structures and indexes.
first, the data structure introduction
There are two kinds of very important data structures in pandas, namely series series and data frame Dataframe. Series is similar to a one-dimensional array in NumPy, in addition to the functions or methods available in a one-dimensional array, and it can be indexed by means of the way to get the data, but also has an automatic index alignment function; Dataframe is similar to a two-dimensional array in NumPy. Functions and methods of numpy arrays can also be generalized, and there are other flexible applications, which are described later.
1. Creation of series
There are three main ways to create a sequence:
1) Creating sequences from one-dimensional arrays
Import NumPy as NP, pandas as PD
arr1 = Np.arange (10)
Arr1
Type (ARR1)
S1 = PD. Series (ARR1)
S1
Type (S1)
2) Create a sequence from a dictionary
Dic1 = {' A ': ten, ' B ': +, ' C ': +, ' d ': +, ' E ': 50}
Dic1
Type (DIC1)
S2 = PD. Series (DIC1)
S2
Type (s2)
3) Create a sequence from a row or column in Dataframe
This part of the content we put in the back, because the following will begin to The creation of the dataframe.
2, the creation of Dataframe
There are three main ways to create a data frame:
1) Create a data frame from a two-dimensional array
ARR2 = Np.array (Np.arange). Reshape (4,3)
Arr2
Type (ARR2)
DF1 = PD. DataFrame (ARR2)
Df1
Type (DF1)
2) Create a data frame from a dictionary
The following are two dictionaries for creating a data frame, a dictionary list, and a nested dictionary.
Dic2 = {' A ': [1,2,3,4], ' B ': [5,6,7,8],
' C ': [9,10,11,12], ' d ': [13,14,15,16]}
Dic2
Type (DIC2)
DF2 = PD. DataFrame (DIC2)
Df2
Type (DF2)
DIC3 = {' One ': {' A ': 1, ' B ': 2, ' C ': 3, ' d ': 4},
' Both ': {' A ': 5, ' B ': 6, ' C ': 7, ' d ': 8},
' Three ': {' A ': 9, ' B ': ten, ' C ': One, ' d ': 12}}
Dic3
Type (DIC3)
DF3 = PD. DataFrame (DIC3)
Df3
Type (DF3)
3) Create a data frame with a data frame
DF4 = df3[[' One ', ' three ']
Df4
Type (DF4)
S3 = df3[' one ')
S3
Type (S3)
Ii. Index of data
Careful friends may find a phenomenon, whether it is a sequence or a data frame, the object's leftmost always has a non-raw data object, what is this? Yes, that's the index we're going to introduce next.
In my opinion, the index of a sequence or a data frame has two major uses, one is to get the target data by index value or index tag, and the other is to make the calculation and operation of the sequence or data frame automatically aligned by the index, and then we take a look at the application of these two functions.
1. Get data by index value or index tag
S4 = PD. Series (Np.array ([1,1,2,3,5,8]))
S4
If you do not give the sequence a specified index value, the sequence automatically generates a self-index that starts at 0. You can view the index of a sequence by index:
1. s4.index
Now let's set a custom index value for the sequence:
S4.index = [' A ', ' B ', ' C ', ' d ', ' e ', ' F ']
S4
When the sequence is indexed, the data can be obtained by index values or index tags:
S4[3]
S4[' E ']
s4[[1,3,5]]
S4[[' A ', ' B ', ' d ', ' F ']
S4[:4]
s4[' C ':]
s4[' B ': ' E ']
Note: If you get the data through the index tag, the value of the end tag can be returned! In a one-dimensional array, it is not possible to get the data through an indexed label, which is also an aspect of a sequence different from one-dimensional arrays.
2. Automatic Alignment
If you have two sequences, you need to perform arithmetic operations on the two sequences, and the existence of the index reflects its value -automation alignment.
S5 = PD. Series (Np.array ([10,15,20,30,55,80]),
index = [' A ', ' B ', ' C ', ' d ', ' e ', ' f '])
S5
S6 = PD. Series (Np.array ([12,11,13,15,14,16]),
index = [' A ', ' C ', ' G ', ' B ', ' d ', ' F '])
S6
S5 + S6
S5/s6
since there is no corresponding G-index in the S5, there is no corresponding e-index in the S6, so the operation of the data produces two missing values Nan. Note that the arithmetic results here achieve automatic alignment of two sequential indexes, rather than simply adding the total or dividing two sequences. Alignment of the data frame, not just the automatic alignment of the row index, but also the automatic alignment of the column index (variable name)
The data frame also has the index, and the data frame is the promotion of the two-dimensional array, so it not only has the row index, but also has the column index, about the data frame index compared to the application of the sequence is more powerful, this part of the content will be explained in the data query.
Article from: Data man
10-minute entry pandas data structures and indexes