What is Pandas DataFrame

Source: Internet
Author: User
Keywords dataframe pandas dataframe pandas dataframe tutorial
1. What is DataFrame

DataFrame is a tabular data structure, it contains an ordered set of columns, each column can be of different value types (numeric, string, Boolean, etc.). DataFrame has both row index and column index, it can be regarded as a dictionary composed of series (share the same index)

2. DateFrame features

Row-oriented and column-oriented operations in DataFrame are basically balanced.

The data in the DataFrame is stored in one or more two-dimensional blocks (instead of lists, dictionaries or other one-dimensional data structures).

3. Create DataFrame

The most common one is to directly pass in a dictionary composed of equal-length lists or NumPy arrays:

In [33]: data={'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],'year':[2000,2001,2002,2001,2002], 'pop':[1.5,1.7,3.6,2.4,2.9]}

In [34]: frame=DataFrame(data)

#Result DataFrame will automatically add an index (same as Series), and all columns will be arranged in order:

In [35]: frame

Out[35]:

pop state year

0 1.5 Ohio 2000

1 1.7 Ohio 2001

2 3.6 Ohio 2002

3 2.4 Nevada 2001

4 2.9 Nevada 2002

4. Specify column order

#Using clolumns to specify the column order

In [36]: DataFrame(data,columns=['year','state','pop'])

Out[36]:

year state pop

0 2000 Ohio 1.5

1 2001 Ohio 1.7

2 2002 Ohio 3.6

3 2001 Nevada 2.4

4 2002 Nevada 2.9

5. NA value

Like Series, if the incoming column is not found in the data, it will produce NA value:

In [37]: DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five '])

Out[37]:

year state pop debt

one 2000 Ohio 1.5 NaN

two 2001 Ohio 1.7 NaN

three 2002 Ohio 3.6 NaN

four 2001 Nevada 2.4 NaN

five 2002 Nevada 2.9 NaN

6. Similar dictionary (or attribute) tags

You can obtain the columns of the DataFrame as a Series in a way similar to dictionary tags or attributes:

In [39]: frame['state'] #or frame.state

Out[39]:

0 Ohio

1 Ohio

2 Ohio

3 Nevada

4 Nevada

Name: state, dtype: object

7. Index field ix (row)

Note that the returned Series has the same index as the original DataFrame, and its name property has been set accordingly. Rows can also be obtained by location or name, for example, using the index field ix:

In [44]: frame2.ix['one']

Out[44]:

year 2000

state Ohio

pop 1.5

debt NaN

Name: one, dtype: object

8. Modify the column by assignment

Columns can be modified by assignment. For example, you can assign a scalar value or a set of values to that empty ‘debt’ column:

In [45]: frame2['debt']=16.5 # or frame2.debt

In [46]: frame2

Out[46]:

year state pop debt

one 2000 Ohio 1.5 16.5

two 2001 Ohio 1.7 16.5

three 2002 Ohio 3.6 16.5

four 2001 Nevada 2.4 16.5

five 2002 Nevada 2.9 16.5

In [50]: frame2.debt=np.arange(5.)

In [51]: frame2

Out[51]:

year state pop debt

one 2000 Ohio 1.5 0.0

two 2001 Ohio 1.7 1.0

three 2002 Ohio 3.6 2.0

four 2001 Nevada 2.4 3.0

five 2002 Nevada 2.9 4.0

When assigning a list or array to a column, its length must match the length of the DataFrame. If the value assigned is a Series, it will exactly match the index of the DataFrame, and all gaps will be filled with missing values:

In [52]: val=Series([-1.2,-1.5,-1.7],index=['two','four','five'])

In [53]: frame2['debt']=val

In [54]: frame2

Out[54]:

year state pop debt

one 2000 Ohio 1.5 NaN

two 2001 Ohio 1.7 -1.2

three 2002 Ohio 3.6 NaN

four 2001 Nevada 2.4 -1.5

five 2002 Nevada 2.9 -1.7

9. Keyword del delete column

Assigning a value to a non-existent column creates a new column. The keyword del is used to delete columns:

In [55]: frame2['eastern']=frame2.state=='Ohio'

In [56]: frame2

Out[56]:

year state pop debt eastern

one 2000 Ohio 1.5 NaN True

two 2001 Ohio 1.7 -1.2 True

three 2002 Ohio 3.6 NaN True

four 2001 Nevada 2.4 -1.5 False

five 2002 Nevada 2.9 -1.7 False

In [57]: del frame2['eastern']

In [58]: frame2.columns

Out[58]: Index(['year','state','pop','debt'], dtype='object')

Warning: The columns returned by indexing are just views of the corresponding data, not copies. Therefore, any in-place modifications made to the returned Series will be reflected on the source DataFrame. You can assign columns explicitly by using the copy method of Series.

10. Nested dictionary

Nested dictionaries (that is, dictionaries of dictionaries):

In [62]: pop={'Nevada':{2001:2.4,2002:2.9},'Ohio':{2000:1.5,2001:17,2002:3.6}}

#If you pass it to the DataFrame, it will be interpreted as: the key of the outer dictionary is used as the column, and the key of the inner layer is used as the row index:

In [63]: frame3=DataFrame(pop)

In [64]: frame3

Out[64]:

Nevada Ohio

2000 NaN 1.5

2001 2.4 17.0

2002 2.9 3.6

The keys of the inner dictionary will be merged and sorted to form the final index. If the index is explicitly specified, this will not be the case:

In [66]: DataFrame(pop,index=[2001,2002,2003])

Out[66]:

Nevada Ohio

2001 2.4 17.0

2002 2.9 3.6

2003 NaN NaN

The dictionary composed of Series has almost the same usage:

In [68]: pdata={'Ohio':frame3['Ohio'][:-1],'Nevada':frame3['Nevada'][:2]}

In [69]: DataFrame(pdata)

Out[69]:

Nevada Ohio

2000 NaN 1.5

2001 2.4 17.0

11. Transpose

In [65]: frame3.T

Out[65]:

2000 2001 2002

Nevada NaN 2.4 2.9

Ohio 1.5 17.0 3.6

12. Index object

Pandas index objects are responsible for managing axis labels and other metadata (such as axis names, etc.).

Index objects are immutable, so users cannot modify them.

Immutability is very important, because only in this way can Index objects be safely shared among multiple data structures.

Note: Although most users do not need to know too much about the Index object, they are indeed an important part of the pandas data model.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.