Python read csv file remove a column and then write a new file technology tutorial

Source: Internet
Author: User
This article mainly for you to share a Python read CSV file to remove a column and then write a new file instance, has a very valuable reference, I hope to help you. Follow the small part together to see it, hope to help everyone better grasp the python

Two ways to solve the problem are the existing solutions on the Web.

Scenario Description:

There is a data file that is saved as text and now has three columns of user_id,plan_id,mobile_id. The goal is to get new documents only mobile_id,plan_id.

Solution Solutions

Scenario One: use the Python open file to write the file directly through the data, for loop processing data and write to the new file.

The code is as follows:


def readwrite1 (input_file,output_file): F = open (Input_file, ' r ') out = open (output_file, ' W ') print (f) for line in F.rea Dlines (): a = Line.split (",") x=a[0] + "," + a[1]+ "\ n" out.writelines (x) f.close () Out.close ()

Scenario Two: use pandas read data to DataFrame and then do data segmentation, directly with the DataFrame write function to write to the new file

The code is as follows:


def readwrite2 (input_file,output_file): Date_1=pd.read_csv (input_file,header=0,sep= ', ') date_1[[' mobile ', ' Plan_ Id ']].to_csv (output_file, sep= ', ', Header=true,index=false)

From a code perspective, pandas logic is clearer.

Let's take a look at the efficiency of execution!


Def getruntimes (fun, Input_file,output_file): Begin_time=int (Round (time.time () * +)) Fun (Input_file,output_file) End_time=int (Round (Time.time ()) print ("Read and write Run Time:", (End_time-begin_time), "MS") Getruntimes (readwrite1,input_ File,output_file) #直接撸数据getRunTimes (readwrite2,input_file,output_file1) #使用dataframe读写数据

Read and write run time: 976 ms

Read and Write run time: 777 MS

Input_file about 270,000 of the data, dataframe efficiency than for loop efficiency or a bit faster, if the amount of data larger, the effect is more obvious?

Try to increase the number of input_file records below and try the following results

Input_file Readwrite1 Readwrite2
27W 976 777
55W 1989 1509
110W 4312 3158

Judging from the above test results, the efficiency of dataframe is increased by about 30%.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.