DataFrame Structure and Common Operations

Source: Internet
Author: User
Keywords dataframe pandas dataframe dataframe structure
1. Introduction
DataFrame unifies two or more Series into a single data structure. Each Series then represents a named column of the DataFrame, and instead of each column having its own index, the DataFrame provides a single index and the data in all columns is aligned to the master index of the DataFrame.
The meaning of this paragraph is that DataFrame provides a table-like structure, which is composed of multiple Series, and Series is called columns in the DataFrame

2. Related operations
a.create
pd.DataFrame()
parameter:
1). Two-dimensional array;
2). Series list;
3). The dictionary whose value is Series;

a.1, two-dimensional array
import pandas as pd
import numpy as np

s1=np.array([1,2,3,4])
s2=np.array([5,6,7,8])
df=pd.DataFrame([s1,s2])
print df

a.2, Series list (the effect is the same as the two-dimensional array)
import pandas as pd
import numpy as np

s1=pd.Series(np.array([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame([s1,s2])
print df




a.3, value is the dictionary structure of Series;
import pandas as pd
import numpy as np

s1=pd.Series(np.array([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame({"a":s1,"b":s2});
print df



Note: If the lengths of array and Series are different in the parameters used in the creation, if the value of the corresponding index does not exist, it is NaN

b. Properties
b.1. columns: the keys corresponding to each column
b.2 .shape: shape, (a, b), index length is a, columns are b
b.3 .index;.values: return index list; return value two-dimensional array
b.4 .head();.tail();
c. if-then operation
c.1 Use .ix[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df.ix[df.A>1,'B']= -1
print df




df.ix [condition, then operation area]

c.2 Use numpy.where
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df["then"]=np.where(df.A<3,1,0)
print df



np.where(condition, then, else)

d. Choose DataFrame according to conditions
d.1 Direct value df.[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df[df.A>=2]
print df




d.2 Use .loc[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df.loc[df.A>2]
print df


(There are many ways to list them all)

e.Grouping
e.1groupby to form a group
df = pd.DataFrame({'animal':'cat dog cat fish dog cat cat'.split(),
                  'size': list('SSMMMLL'),
                  'weight': [8, 10, 11, 1, 20, 12, 12],
                  'adult': [False] * 5 + [True] * 2});
#List the corresponding size of the animal with the largest weight
group=df.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()])
print group



e.2 Use get_group to get one of the groups

df = pd.DataFrame({'animal':'cat dog cat fish dog cat cat'.split(),
                  'size': list('SSMMMLL'),
                  'weight': [8, 10, 11, 1, 20, 12, 12],
                  'adult': [False] * 5 + [True] * 2});

group=df.groupby("animal")
cat=group.get_group("cat")
print cat


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.