1. Introduction
DataFrame unifies two or more Series into a single data structure. Each Series then represents a named column of the DataFrame, and instead of each column having its own index, the DataFrame provides a single index and the data in all columns is aligned to the master index of the DataFrame.
The meaning of this paragraph is that DataFrame provides a table-like structure, which is composed of multiple Series, and Series is called columns in the DataFrame
2. Related operations
a.create
pd.DataFrame()
parameter:
1). Two-dimensional array;
2). Series list;
3). The dictionary whose value is Series;
a.1, two-dimensional array
import pandas as pd
import numpy as np
s1=np.array([1,2,3,4])
s2=np.array([5,6,7,8])
df=pd.DataFrame([s1,s2])
print df
a.2, Series list (the effect is the same as the two-dimensional array)
import pandas as pd
import numpy as np
s1=pd.Series(np.array([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame([s1,s2])
print df
a.3, value is the dictionary structure of Series;
import pandas as pd
import numpy as np
s1=pd.Series(np.array([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame({"a":s1,"b":s2});
print df
Note: If the lengths of array and Series are different in the parameters used in the creation, if the value of the corresponding index does not exist, it is NaN
b. Properties
b.1. columns: the keys corresponding to each column
b.2 .shape: shape, (a, b), index length is a, columns are b
b.3 .index;.values: return index list; return value two-dimensional array
b.4 .head();.tail();
c. if-then operation
c.1 Use .ix[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df.ix[df.A>1,'B']= -1
print df
df.ix [condition, then operation area]
c.2 Use numpy.where
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df["then"]=np.where(df.A<3,1,0)
print df
np.where(condition, then, else)
d. Choose DataFrame according to conditions
d.1 Direct value df.[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df[df.A>=2]
print df
d.2 Use .loc[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df.loc[df.A>2]
print df
(There are many ways to list them all)
e.Grouping
e.1groupby to form a group
df = pd.DataFrame({'animal':'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult': [False] * 5 + [True] * 2});
#List the corresponding size of the animal with the largest weight
group=df.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()])
print group
e.2 Use get_group to get one of the groups
df = pd.DataFrame({'animal':'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult': [False] * 5 + [True] * 2});
group=df.groupby("animal")
cat=group.get_group("cat")
print cat