[DJANGO] Excel a hundred thousand of line data fast Import Database research

Source: Internet
Author: User
Tags import database

First paste the original import data code:

8Importos Os.environ.setdefault ("Django_settings_module","www.settings") " "when the Django version is greater than or equal to 1.7, you need to add the following two lines of import djangodjango.setup () or throw an error Django.core.exceptions.AppRegistryNotReady: Models aren ' t loaded yet." "ImportDjangoifDjango. VERSION >= (1, 7):#automatically determine versionDjango.setup () fromArrears.modelsImportD072QFImportXlrd#Excel reading Tools fromDatetimeImportdatetime fromXlrdImportXldate_as_tupleImport TimeImportRandom time1=time.time ()#data= xlrd.open_workbook (' 11.xlsx ') Open fileWith Xlrd.open_workbook ('11.xlsx') as data:PrintU"read the end of the file and start importing!"time2=time.time () Table= Data.sheet_by_index (0)#Get WorksheetsTime3 =Time.time () n=1x= y = Z =0 worklist= []     forLineinchRange (N,table.nrows):#nrows = table.nrows #行数 ncols = table.ncols #列数 print sh.row_values (rownum)row =table.row_values (line)ifRow#to see if a row value is empty             forIinch[0,1,2,4,28,30,32]:                ifType (row[i]) = =Float:row[i]=Int (row[i])ifD072Qf.objects.filter (Acct_month = row[0],serv_id=row[1]). Exists ():#determines whether the row value is duplicated in the databasex = x + 1#Repeat value Count            Else: Worklist.append (D072QF (Acct_month=row[0],serv_id=row[1],acc_nbr=row[2],user_name=row[3],acct_code=row[4], Acct_name=row[5],product_name=row[6],current_charge=row[7],one_charge=row[8], Two_charge=row[9],three_charge=row[10],four_charge=row[11],five_charge=row[12], Six_charge=row[13],seven_charge=row[14],eight_charge=row[15],nine_charge=row[16], Ten_charge=row[17],eleven_charge=row[18],twelve_charge=row[19],oneyear_charge=row[20], Threeyear_charge=row[21],upthreeyear_charge=row[22],all_qf=row[23],morethree_qf=row[24], aging=row[25],serv_state_name=row[26],mkt_chnl_name=row[27],mkt_chnl_id=row[28], Mkt_region_name=row[29],mkt_region_id=row[30],mkt_grid_name=row[31],mkt_grid_id=row[32], prod_addr=row[33])) y= y + 1#non-repeating count        Else: Z= z + 1#empty row value Countn = n + 1ifn% 9999 = =0:d072qf.objects.bulk_create (worklist) worklist=[] Time.sleep (Random.random ())#let the CPU rest at random 0 <= N < 1.0 s            Print "import successfully once!"            Print 'data import succeeded, import'+str (y) +'Bar, repeat'+STR (x) +'Bar, there'+str (z) +'Act empty!'Time4=time.time ()Print "time to read files"+str (TIME2-TIME1) +"seconds, time to import data"+str (Time4-time3) +"seconds!"


This code currently does not all a hundred thousand of rows of data into the database, only spent 1 hours to import 50,000 rows of data into which, after more and more slowly, mainly slow in the Excel table to 70,000 rows of data, reading the data in Excel is very slow, the overall impact on the import speed for several reasons:

1, has been the use of XLRD import xls format file, if the file has a hundred thousand of lines, just read the file will take 200 seconds, if you change to CSV, it hardly takes time

2, this line of code will also affect the speed, especially when the data in the database is large: if D072Qf.objects.filter (Acct_month = row[0],serv_id=row[1]). Exists (): # Determines whether the row value is duplicated in the database

3, if a one-time dictionary add a hundred thousand of rows of data, on the Windows CPU is not suffering! So the recommended 10,000 data import once, empty the list

Post-Improvement Code:

Optimization part: use CSV format; Cancel check duplicate data statement; Import data every 50,000

#Coding:utf-8Importos Os.environ.setdefault ("Django_settings_module","www.settings") " "when the Django version is greater than or equal to 1.7, you need to add the following two lines of import djangodjango.setup () or throw an error Django.core.exceptions.AppRegistryNotReady: Models aren ' t loaded yet." "ImportDjangoifDjango. VERSION >= (1, 7):#automatically determine versionDjango.setup () fromArrears.modelsImportD072QFImport TimeImportrandomtime1=Time.time () F= Open ('11.csv')PrintU"read the end of the file and start importing!"time2=time.time () worklist=[]next (f)#move a file marker to the next liney =0n= 1 forLineinchF:row= Line.replace ('"',"')#replace NULL in the dictionary with therow = Row.split (';')#to slice a string by;y = y + 1worklist.append (D072QF (Acct_month=row[0],serv_id=row[1],acc_nbr=row[2],user_name=row[3],acct_code=row[4], Acct_name=row[5],product_name=row[6],current_charge=row[7],one_charge=row[8], Two_charge=row[9],three_charge=row[10],four_charge=row[11],five_charge=row[12], Six_charge=row[13],seven_charge=row[14],eight_charge=row[15],nine_charge=row[16], Ten_charge=row[17],eleven_charge=row[18],twelve_charge=row[19],oneyear_charge=row[20], Threeyear_charge=row[21],upthreeyear_charge=row[22],all_qf=row[23],morethree_qf=row[24], aging=row[25],serv_state_name=row[26],mkt_chnl_name=row[27],mkt_chnl_id=row[28], Mkt_region_name=row[29],mkt_region_id=row[30],mkt_grid_name=row[31],mkt_grid_id=row[32], prod_addr=row[33])) n= n + 1ifn%50000==0:Printn D072Qf.objects.bulk_create (worklist) worklist=[] Time3=time.time ()Print "time to read files"+str (TIME2-TIME1) +"seconds, time to import data"+str (time3-time2) +"seconds!"Time3=time.time ()PrintnD072Qf.objects.bulk_create (worklist)Print "time to read files"+str (TIME2-TIME1) +"seconds, time to import data"+str (time3-time2) +"seconds!"worklist= []Print "successfully imported data"+str (y) +"Strip"f.close ()

It was a surprise!!!, only took 73 seconds.

Python 2.7.10 (default, May, 09:40:32) [MSC v.1500 32bit (Intel)] on Win32type"Copyright","credits" or "license ()"  forMore information.>>> ================================ RESTART ================================>>>read the end of the file and start importing!50000reading a file takes 0.0 seconds and importing data takes 34.3279998302 seconds!100000reading a file takes 0.0 seconds and importing data takes 67.3599998951 seconds!138400it takes 0.0 seconds to read the file, and 73.4379999638 seconds to import the data! Successfully imported Data 138,399 article>>>

[DJANGO] Excel a hundred thousand of line data fast Import Database research

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.