1. Error reading file:
Low_memory function: If you say that you do not specify Dtype, the default panda will fetch the field below the space occupying the smallest memory storage unit, if you specify Low_memory=false, skip this step of judgment directly, The default is the storage unit of the longest unit;
2. When using panda for large-scale data reading, block reads
Cuiji1_1=pd.read_csv (' Cuiji1_1.csv ', low_memory=false)
Cuiji1_2=pd.read_csv (' Cuiji1_2.csv ', low_memory=false)
Cuiji2=pd.read_csv (' Cuiji2.csv ', low_memory=false)
Cuiji3=pd.read_csv (' Cuiji3.csv ', low_memory=false)
Cuiji4=pd.read_csv (' Cuiji4.csv ', low_memory=false)
Cuiji5=pd.read_csv (' Cuiji5.csv ', low_memory=false)
Cuiji6=pd.read_csv (' Cuiji6.csv ', low_memory=false)
Cuiji7=pd.read_csv (' Cuiji7.csv ', low_memory=false)
Cuiji8=pd.read_csv (' Cuiji8.csv ', low_memory=false)
Cuiji9=pd.read_csv (' Cuiji9.csv ', low_memory=false)
FRAMES=[CUIJI1_1,CUIJI1_2,CUIJI2,CUIJI3,CUIJI4,CUIJI5,CUIJI6,CUIJI7,CUIJI8,CUIJI9]
Cuiji=pd.concat (frames)
3. Methods for handling missing values using pandas
Cuiji.dropna (axis=0, how= ' any ', inplace=true) where the axis parameter as =0 means to delete the row containing the empty row; 1 is the column to delete
4. Delete duplicate rows
Cuiji.drop_duplicates (Inplace=true)
5. Use Pandas to process time date formats
cuiji[' create_date ' = pd.to_datetime (cuiji[' create_date '].apply (Lambda x:time.strftime ("%y-%m-%d", Time.localtime (x))))
6. Perform a function transformation for a column to convert the column value into the desired form
cj[' freq ']=cj.apply (Lambda X:round ((x[8].days+1)/x[7],2), Axis=1) For example, we need to calculate the value of the Freq column, which requires the 9th column divided by the 8th column. You need to set the Apply function on the left, where Axis=1 represents the action on the row
7. For error A. is trying to set a dataframe
This means that we get a view of the Dataframe through some query operations, and we want to add a column or something to that view, and then it will get an error, thinking we can't do it in the view, not the dataframe.
The solution is simple: just copy the view to it;
For example:
Cuiji_filter = cuiji[(cuiji.create_date >= tstart) & (cuiji.create_date< tend)] # case Reminder Pool
Cuiji_filter=cuiji_filter.copy ()
At this point, we find that cuiji_filter is just a view at this moment, is a part of what we see from Dataframe, after the copy operation is completed, it is dataframe;
8. Conditional filtering operations
Cuiji_filter = cuiji[(cuiji.create_date >= tstart) & (cuiji.create_date< tend)] # case Reminder Pool
Note that inside the parentheses, when we use cui.create_date, we do not give quotation marks.
9. Relatively simple, if we want to see the first n rows of data, we only need
Pirnt (Cuiji_filter.head (N))
10. For the Panda group query operation:
Grouped=cuiji_filter.groupby (cuiji_filter[' case_no ']) groupby inside is what column we want to group, and the SQL inside the GroupBy operation is the same
Note: If you print the grouped data type, we find that it is not a dataframe format, but rather a groupby format;
11. How to turn the GroupBy statistical results into Dataframe
Grouped=cuiji_filter.groupby (cuiji_filter[' case_no ')
Cjnum=grouped.size () #按caseno统计催记数目
CJNUM=PD. DataFrame (Cjnum)
Cjnum=cjnum.reset_index ()
12. How to convert the results of GroupBy to dict format
groupphone=cuiji_filter[' phone '].groupby (cuiji_filter[' case_no ')
Gp=dict (List (groupphone))
13.dataframe Connection operation
Cj=pd.merge (Cuiji_filter, Cjnum, on=[' Case_no ')
Pandas some use methods to organize