10-minute entry pandas data structures and indexes

Source: Internet
Author: User

Pandas data structures and indexes are Getting Started Pandas must learn the content, here in detail to explain to you, read this article, I believe you Pandas There is a clear understanding of data structures and indexes.

first, the data structure introduction

There are two kinds of very important data structures in pandas, namely series series and data frame Dataframe. Series is similar to a one-dimensional array in NumPy, in addition to the functions or methods available in a one-dimensional array, and it can be indexed by means of the way to get the data, but also has an automatic index alignment function; Dataframe is similar to a two-dimensional array in NumPy. Functions and methods of numpy arrays can also be generalized, and there are other flexible applications, which are described later.

1. Creation of series

There are three main ways to create a sequence:

1) Creating sequences from one-dimensional arrays

Import NumPy as NP, pandas as PD

arr1 = Np.arange (10)

Arr1

Type (ARR1)

S1 = PD. Series (ARR1)

S1

Type (S1)

2) Create a sequence from a dictionary

Dic1 = {' A ': ten, ' B ': +, ' C ': +, ' d ': +, ' E ': 50}

Dic1

Type (DIC1)

S2 = PD. Series (DIC1)

S2

Type (s2)

3) Create a sequence from a row or column in Dataframe

This part of the content we put in the back, because the following will begin to The creation of the dataframe.

2, the creation of Dataframe

There are three main ways to create a data frame:

1) Create a data frame from a two-dimensional array

ARR2 = Np.array (Np.arange). Reshape (4,3)

Arr2

Type (ARR2)

DF1 = PD. DataFrame (ARR2)

Df1

Type (DF1)

2) Create a data frame from a dictionary

The following are two dictionaries for creating a data frame, a dictionary list, and a nested dictionary.

Dic2 = {' A ': [1,2,3,4], ' B ': [5,6,7,8],

' C ': [9,10,11,12], ' d ': [13,14,15,16]}

Dic2

Type (DIC2)

DF2 = PD. DataFrame (DIC2)

Df2

Type (DF2)

DIC3 = {' One ': {' A ': 1, ' B ': 2, ' C ': 3, ' d ': 4},

' Both ': {' A ': 5, ' B ': 6, ' C ': 7, ' d ': 8},

' Three ': {' A ': 9, ' B ': ten, ' C ': One, ' d ': 12}}

Dic3

Type (DIC3)

DF3 = PD. DataFrame (DIC3)

Df3

Type (DF3)

3) Create a data frame with a data frame

DF4 = df3[[' One ', ' three ']

Df4

Type (DF4)

S3 = df3[' one ')

S3

Type (S3)

Ii. Index of data

Careful friends may find a phenomenon, whether it is a sequence or a data frame, the object's leftmost always has a non-raw data object, what is this? Yes, that's the index we're going to introduce next.
In my opinion, the index of a sequence or a data frame has two major uses, one is to get the target data by index value or index tag, and the other is to make the calculation and operation of the sequence or data frame automatically aligned by the index, and then we take a look at the application of these two functions.

1. Get data by index value or index tag

S4 = PD. Series (Np.array ([1,1,2,3,5,8]))

S4

If you do not give the sequence a specified index value, the sequence automatically generates a self-index that starts at 0. You can view the index of a sequence by index:

1. s4.index

Now let's set a custom index value for the sequence:

S4.index = [' A ', ' B ', ' C ', ' d ', ' e ', ' F ']

S4

When the sequence is indexed, the data can be obtained by index values or index tags:

S4[3]

S4[' E ']

s4[[1,3,5]]

S4[[' A ', ' B ', ' d ', ' F ']

S4[:4]

s4[' C ':]

s4[' B ': ' E ']

Note: If you get the data through the index tag, the value of the end tag can be returned! In a one-dimensional array, it is not possible to get the data through an indexed label, which is also an aspect of a sequence different from one-dimensional arrays.

2. Automatic Alignment

If you have two sequences, you need to perform arithmetic operations on the two sequences, and the existence of the index reflects its value -automation alignment.

S5 = PD. Series (Np.array ([10,15,20,30,55,80]),

index = [' A ', ' B ', ' C ', ' d ', ' e ', ' f '])

S5

S6 = PD. Series (Np.array ([12,11,13,15,14,16]),

index = [' A ', ' C ', ' G ', ' B ', ' d ', ' F '])

S6

S5 + S6

S5/s6

since there is no corresponding G-index in the S5, there is no corresponding e-index in the S6, so the operation of the data produces two missing values Nan. Note that the arithmetic results here achieve automatic alignment of two sequential indexes, rather than simply adding the total or dividing two sequences. Alignment of the data frame, not just the automatic alignment of the row index, but also the automatic alignment of the column index (variable name)

The data frame also has the index, and the data frame is the promotion of the two-dimensional array, so it not only has the row index, but also has the column index, about the data frame index compared to the application of the sequence is more powerful, this part of the content will be explained in the data query.

Article from: Data man

10-minute entry pandas data structures and indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.