Dataframe is a data structure in pandas in python. A structure similar to a table.Here intoduces some of Dataframe common operations.
First, create, take a column, delete a column
import pandas as pd
lst=[2,3,5] #represents a line of df
df=pd.DataFrame(data=[lst,lst],columns=[‘col1‘,‘col2’,‘col3’]) #Generate DF from the list
df=pd.DataFrame(data={‘col1‘:[2]*2,‘col2‘:[3]*2,‘col3’:[5]*2})# dictionary to generate DF
df[[‘col1‘,‘col2‘]]#You can take out the data of the first and second columns
df[2:][[‘col1‘,‘col2‘]] #You can take out the data of the first and second columns from the second row to the last row
df.drop(‘col1’, axis=1)#Delete the first column
df.drop([‘col1‘,‘col2‘,axis=1])
Second, operate on one or more columns
1. Use map to operate on a column
df[‘col‘] = df[‘col1‘].map(lambda x: x**2) #Generate a column that is the square of the first column
2. Use apply to operate on one or more columns
df.index = pd.date_range(‘20190101’, periods=5) #Change the original index to use the date as the index
df['col'] = df.apply(lambda x:x['col1']*x['col2'], axis=1) #Rewrite the'col' column to the corresponding row of the'col1' column multiply by ' Corresponding row of col2' column
Third, find the moving average
df[‘MA‘] = df[‘col‘].rolling(window=3, center=False).mean()
Fourth, make the column up or down translation transformation
df = pd.DataFrame({‘id‘:[1,1,1,2,2,3],‘value‘:[1,2,3,4,5,6]})
df[‘value_shift‘] = df.groupby(‘id‘)[‘value‘].shift(1) #Group by id column, shift the value column by translation, that is, move down 1 row
df[‘value_shift_1‘] = df.groupby(‘id‘)[‘value‘].shift(-1) #Group by id column, shift the value column by translation, that is, move up 1 row
Fifth. Standardized treatment of columns:
from sklearn import preprocessing
df = pd.DataFrame({'id':[1,1,1,2,2,3],'value1':[1,2,3,4,5,6],'value2':[1, 3,4,3,7,2]))
value=df[[‘value1‘,‘value2‘]]
value_T=value.transpose() #value_T is an array type
scaler=preprocessing.StandardScaler().fit(value_T) #scaler is to standardize the row data, so the df column data should be transposed
value_T_scale = scaler.transform(value_T)
value_scale = value_T_scale.transpose()
#Sometimes you need to use the reshape of np.array:
y=df[[‘value‘]] #y.shape=(6,1)
y=y.reshape(1,-1) #y.shape=(1,6)
y=y.reshape(-1,1) #y.shape=(6,1)
y=np.repeat(0,len(y)) #Generate zero matrix
Sixth, assign a value to a column
df = pd.DataFrame({‘id‘:[1,1,1,2,2,3],‘value‘:[1,2,3,4,5,6]})
value=[11,22,33]
df.loc[df.index[0:3],‘value‘]=value
df.loc[df.index[0:3], ‘value0’]=value
Seventh. Make frequency statistics for multiple repeated characters in the list
lst=['a','a','a','b','c','c','b','e','f','a','a','c' ]
cnt = pd.Series(lst).value_counts()