This article mainly for you to share a Python read CSV file to remove a column and then write a new file instance, has a very valuable reference, I hope to help you. Follow the small part together to see it, hope to help everyone better grasp the python
Two ways to solve the problem are the existing solutions on the Web.
Scenario Description:
There is a data file that is saved as text and now has three columns of user_id,plan_id,mobile_id. The goal is to get new documents only mobile_id,plan_id.
Solution Solutions
Scenario One: use the Python open file to write the file directly through the data, for loop processing data and write to the new file.
The code is as follows:
def readwrite1 (input_file,output_file): F = open (Input_file, ' r ') out = open (output_file, ' W ') print (f) for line in F.rea Dlines (): a = Line.split (",") x=a[0] + "," + a[1]+ "\ n" out.writelines (x) f.close () Out.close ()
Scenario Two: use pandas read data to DataFrame and then do data segmentation, directly with the DataFrame write function to write to the new file
The code is as follows:
def readwrite2 (input_file,output_file): Date_1=pd.read_csv (input_file,header=0,sep= ', ') date_1[[' mobile ', ' Plan_ Id ']].to_csv (output_file, sep= ', ', Header=true,index=false)
From a code perspective, pandas logic is clearer.
Let's take a look at the efficiency of execution!
Def getruntimes (fun, Input_file,output_file): Begin_time=int (Round (time.time () * +)) Fun (Input_file,output_file) End_time=int (Round (Time.time ()) print ("Read and write Run Time:", (End_time-begin_time), "MS") Getruntimes (readwrite1,input_ File,output_file) #直接撸数据getRunTimes (readwrite2,input_file,output_file1) #使用dataframe读写数据
Read and write run time: 976 ms
Read and Write run time: 777 MS
Input_file about 270,000 of the data, dataframe efficiency than for loop efficiency or a bit faster, if the amount of data larger, the effect is more obvious?
Try to increase the number of input_file records below and try the following results
Input_file |
Readwrite1 |
Readwrite2 |
27W |
976 |
777 |
55W |
1989 |
1509 |
110W |
4312 |
3158 |
Judging from the above test results, the efficiency of dataframe is increased by about 30%.