Pandas some use methods to organize

Source: Internet
Author: User

1. Error reading file:

Low_memory function: If you say that you do not specify Dtype, the default panda will fetch the field below the space occupying the smallest memory storage unit, if you specify Low_memory=false, skip this step of judgment directly, The default is the storage unit of the longest unit;

2. When using panda for large-scale data reading, block reads

Cuiji1_1=pd.read_csv (' Cuiji1_1.csv ', low_memory=false)
Cuiji1_2=pd.read_csv (' Cuiji1_2.csv ', low_memory=false)
Cuiji2=pd.read_csv (' Cuiji2.csv ', low_memory=false)
Cuiji3=pd.read_csv (' Cuiji3.csv ', low_memory=false)
Cuiji4=pd.read_csv (' Cuiji4.csv ', low_memory=false)
Cuiji5=pd.read_csv (' Cuiji5.csv ', low_memory=false)
Cuiji6=pd.read_csv (' Cuiji6.csv ', low_memory=false)
Cuiji7=pd.read_csv (' Cuiji7.csv ', low_memory=false)
Cuiji8=pd.read_csv (' Cuiji8.csv ', low_memory=false)
Cuiji9=pd.read_csv (' Cuiji9.csv ', low_memory=false)
FRAMES=[CUIJI1_1,CUIJI1_2,CUIJI2,CUIJI3,CUIJI4,CUIJI5,CUIJI6,CUIJI7,CUIJI8,CUIJI9]
Cuiji=pd.concat (frames)

3. Methods for handling missing values using pandas

Cuiji.dropna (axis=0, how= ' any ', inplace=true) where the axis parameter as =0 means to delete the row containing the empty row; 1 is the column to delete

4. Delete duplicate rows
Cuiji.drop_duplicates (Inplace=true)

5. Use Pandas to process time date formats

cuiji[' create_date ' = pd.to_datetime (cuiji[' create_date '].apply (Lambda x:time.strftime ("%y-%m-%d", Time.localtime (x))))

6. Perform a function transformation for a column to convert the column value into the desired form

cj[' freq ']=cj.apply (Lambda X:round ((x[8].days+1)/x[7],2), Axis=1) For example, we need to calculate the value of the Freq column, which requires the 9th column divided by the 8th column. You need to set the Apply function on the left, where Axis=1 represents the action on the row

7. For error A. is trying to set a dataframe

This means that we get a view of the Dataframe through some query operations, and we want to add a column or something to that view, and then it will get an error, thinking we can't do it in the view, not the dataframe.

The solution is simple: just copy the view to it;

For example:

Cuiji_filter = cuiji[(cuiji.create_date >= tstart) & (cuiji.create_date< tend)] # case Reminder Pool
Cuiji_filter=cuiji_filter.copy ()

At this point, we find that cuiji_filter is just a view at this moment, is a part of what we see from Dataframe, after the copy operation is completed, it is dataframe;

8. Conditional filtering operations

Cuiji_filter = cuiji[(cuiji.create_date >= tstart) & (cuiji.create_date< tend)] # case Reminder Pool

Note that inside the parentheses, when we use cui.create_date, we do not give quotation marks.

9. Relatively simple, if we want to see the first n rows of data, we only need

Pirnt (Cuiji_filter.head (N))

10. For the Panda group query operation:

Grouped=cuiji_filter.groupby (cuiji_filter[' case_no ']) groupby inside is what column we want to group, and the SQL inside the GroupBy operation is the same

Note: If you print the grouped data type, we find that it is not a dataframe format, but rather a groupby format;

11. How to turn the GroupBy statistical results into Dataframe

Grouped=cuiji_filter.groupby (cuiji_filter[' case_no ')
Cjnum=grouped.size () #按caseno统计催记数目
CJNUM=PD. DataFrame (Cjnum)
Cjnum=cjnum.reset_index ()

12. How to convert the results of GroupBy to dict format

groupphone=cuiji_filter[' phone '].groupby (cuiji_filter[' case_no ')
Gp=dict (List (groupphone))

13.dataframe Connection operation

Cj=pd.merge (Cuiji_filter, Cjnum, on=[' Case_no ')

Pandas some use methods to organize

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.