Dataframe is a data structure in pandas in python. A structure similar to a table.
Build dataframe data.
import pandas
import numpy as np
from pandas import DataFrame
?#df=DataFrame([[1,2,3],[4,5,6]])
df = DataFrame(data=np.arange(16).reshape(4,4), columns=list(‘abcd’))
DataFrame(data=np.arange(16).reshape(4,4), columns=list(‘abcd’), index=list(‘ABCD’))
Data selection
df[0:1] select the first line df[0] selected by position is wrong
df[‘a’] select column a, select by column name
df.loc[0] selects rows by index df.loc[0:3] selects rows 0,1,2 df.loc[‘A’] selects rows with index A
df.loc[[‘A’],’a’] select together
df.iloc[0,2] can only be selected by position, select the first row and third column
df.iloc[0:3,1:3] is selected by position slicing. You can select more or select an element
df.at[1,’a’] is selected by name, only one can be selected
df.iat[1,2] select by location, only one can be selected
df.ix[1] select rows by position df.ix[‘A’] select rows by index
df.ix[1,’a’] selects rows and columns at the same time, position and index can be selected for one element or multiple
The data selected by df is basically a dataframe structure and cannot be used directly
Use df.values to get its value
df.columns output column information
df.index Output index related information
df.describe() will display the average value of each column
df.info() displays basic data information
df.count() df.mean() df.max() df.min() statistics of each column
df.head(10) outputs the first 10 lines df.tail(10) outputs the last 10 lines
df.isnull.sum() counts the number of null values in the table by column
df.where(df>10).count() counts by column the number of elements in the table greater than 10
df.groupby(‘y’).count() groups according to the y attribute, and counts the distribution of each group
df[df[‘price’]<’7.2’] displays the eligible rows
df.where(df[‘price’]<’7.2’) All rows are displayed, and the price that does not meet the conditions is displayed as nan