Python reads a CSV file, removes a column, and then writes a new file

Source: Internet
Author: User

Two ways to solve this problem are the existing solutions on the Internet.

Scenario Description:

There is a data file that is saved as text and now has three columns of user_id,plan_id,mobile_id. The goal is to get new documents only mobile_id,plan_id.

Solution Solutions

Scenario One: Use the Python open file to write the file directly through the data, for loop processing data and write to the new file.

The code is as follows:

defreadwrite1 (Input_file,output_file): F= Open (Input_file,'R') out= Open (Output_file,'W')    Print(f) forLineinchF.readlines (): a= Line.split (",") x=a[0] +","+ a[1]+"\ n"out.writelines (x) f.close () out.close ()

Scenario Two: Use pandas read data to DataFrame and then do data segmentation, directly with the DataFrame write function to write to the new file

The code is as follows:

def Readwrite2 (input_file,output_file):    date_1=pd.read_csv (input_file,header=0,sep=',  ')    date_1[['mobile'plan_id []]. To_csv (output_file, sep=',', Header=true,index=false)

From a code perspective, pandas logic is clearer.

Let's take a look at the efficiency of execution!

def getruntimes (Fun, input_file,output_file):    begin_time=int (Round (Time.time () *)) Fun    ( Input_file,output_file)    end_time=int (Round (Time.time () *)    )print(" Read and Write run time: ", (end_time-begin_time),"ms") getruntimes (readwrite1,input_ File,output_file)  #直接撸数据getRunTimes (readwrite2,input_file,output_file1) #使用dataframe读写数据

Read and write run time: 976 ms
Read and Write run time: 777 MS

Input_file about 270,000 of the data, dataframe efficiency than for loop efficiency or a bit faster, if the amount of data larger, the effect is more obvious?

Try to increase the number of input_file records below and try the following results

Input_file
Readwrite1
Readwrite2

27W

976 777

456

1989 1509

110W

4312 3158

Judging from the above test results, the efficiency of dataframe is increased by about 30%.

Python reads a CSV file, removes a column, and then writes a new file

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.