[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.
First paste the original data import code:
8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qfimport xlrd # excel read tool from datetime import datetimefrom xlrd import xldate_as_tupleimport ti Meimport random time1 = time. time () # data = xlrd.open_workbook('11.xlsx') open the file with xlrd.open_workbook('11.xlsx') as data: print u ". The file is read and imported! "Time2 = time. time () table = data. sheet_by_index (0) # obtain the worksheet time3 = time. time () n = 1 x = y = z = 0 WorkList = [] for line in range (n, table. nrows): # nrows = table. nrows # number of rows ncols = table. ncols # Number of columns print sh. row_values (rownum) row = table. row_values (line) if row: # Check whether the row value is empty for I in [, 32]: if type (row [I]) = float: row [I] = int (row [I]) if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database x = x + 1 # repeated value count else: WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], t En_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = row [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) y = y + 1 # non-repeated count else: z = z + 1 # null Row value count n = n + 1 if n % 9999 = 0: D072Qf. objects. bulk_create (WorkList) WorkList = [] time. sleep (random. random () # Let the Cpu rest randomly 0 <= n <1.0 s print "Import successful once! "Print 'data imported successfully, '+ str (y) +', repeated '+ str (x) +', with '+ str (z) + 'Action blank! 'Time4 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time4-time3) + "seconds!"
Currently, this code does not completely import tens of thousands of rows of data into the database. It takes only one hour to import 50 thousand rows of data into the database, and the subsequent steps become slower and slower, it is mainly slow when the excel table reaches about 70 thousand rows of data, and reading data in excel is slow. There are several reasons that affect the import speed:
1. I have always used xlrd to import xls files. If there are tens of thousands of lines in the file, it will take 200 seconds to read the file. If it is changed to csv, it will take almost no time.
2. This line of statements in the Code also affects the speed, especially when the data in the database is large: if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database
3. If tens of thousands of lines of data are added to the dictionary at a time, the cpu of windows will be overwhelmed! Therefore, we recommend that you clear the list after importing 10 thousand data records.
Improved code:
Optimization part: the csv format is used; check duplicate data statement is canceled; data is imported every 50 thousand
# Coding: UTF-8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qf import timeimport randomtime1 = time. time () f = open('11.csv ') print u "after reading the file, start Import! "Time2 = time. time () WorkList = [] next (f) # Move the file tag to the next line y = 0n = 1for line in f: row = line. replace ('"','') # replace "in the dictionary with null row = row. split (';') # Press; to segment the string y = y + 1 WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], Five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], ten_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = ro W [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) n = n + 1 if n % 50000 = 0: print n D072Qf. objects. bulk_create (WorkList) WorkList = [] time3 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "Time3 = time. time () print nD072Qf. objects. bulk_create (WorkList) print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "WorkList = [] print" imported data "+ str (y) +" bar "f. close ()
The results are amazing !!!, It takes 73 seconds
Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32Type "copyright", "credits" or "license () "for more information. >>> ================================== RESTART ======== ==========================================>> end of reading the file, start import! 50000 reading files takes 0.0 seconds and importing data takes 34.3279998302 seconds! 100000 reading files takes 0.0 seconds and importing data takes 67.3599998951 seconds! 138400 reading files takes 0.0 seconds and importing data takes 73.4379999638 seconds! 138399 data records imported successfully >>>