[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.

Source: Internet
Author: User

[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.

First paste the original data import code:

8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qfimport xlrd # excel read tool from datetime import datetimefrom xlrd import xldate_as_tupleimport ti Meimport random time1 = time. time () # data = xlrd.open_workbook('11.xlsx') open the file with xlrd.open_workbook('11.xlsx') as data: print u ". The file is read and imported! "Time2 = time. time () table = data. sheet_by_index (0) # obtain the worksheet time3 = time. time () n = 1 x = y = z = 0 WorkList = [] for line in range (n, table. nrows): # nrows = table. nrows # number of rows ncols = table. ncols # Number of columns print sh. row_values (rownum) row = table. row_values (line) if row: # Check whether the row value is empty for I in [, 32]: if type (row [I]) = float: row [I] = int (row [I]) if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database x = x + 1 # repeated value count else: WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], t En_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = row [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) y = y + 1 # non-repeated count else: z = z + 1 # null Row value count n = n + 1 if n % 9999 = 0: D072Qf. objects. bulk_create (WorkList) WorkList = [] time. sleep (random. random () # Let the Cpu rest randomly 0 <= n <1.0 s print "Import successful once! "Print 'data imported successfully, '+ str (y) +', repeated '+ str (x) +', with '+ str (z) + 'Action blank! 'Time4 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time4-time3) + "seconds!"


Currently, this code does not completely import tens of thousands of rows of data into the database. It takes only one hour to import 50 thousand rows of data into the database, and the subsequent steps become slower and slower, it is mainly slow when the excel table reaches about 70 thousand rows of data, and reading data in excel is slow. There are several reasons that affect the import speed:

1. I have always used xlrd to import xls files. If there are tens of thousands of lines in the file, it will take 200 seconds to read the file. If it is changed to csv, it will take almost no time.

2. This line of statements in the Code also affects the speed, especially when the data in the database is large: if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database

3. If tens of thousands of lines of data are added to the dictionary at a time, the cpu of windows will be overwhelmed! Therefore, we recommend that you clear the list after importing 10 thousand data records.

Improved code:

Optimization part: the csv format is used; check duplicate data statement is canceled; data is imported every 50 thousand

# Coding: UTF-8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qf import timeimport randomtime1 = time. time () f = open('11.csv ') print u "after reading the file, start Import! "Time2 = time. time () WorkList = [] next (f) # Move the file tag to the next line y = 0n = 1for line in f: row = line. replace ('"','') # replace "in the dictionary with null row = row. split (';') # Press; to segment the string y = y + 1 WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], Five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], ten_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = ro W [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) n = n + 1 if n % 50000 = 0: print n D072Qf. objects. bulk_create (WorkList) WorkList = [] time3 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "Time3 = time. time () print nD072Qf. objects. bulk_create (WorkList) print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "WorkList = [] print" imported data "+ str (y) +" bar "f. close ()

The results are amazing !!!, It takes 73 seconds

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32Type "copyright", "credits" or "license () "for more information. >>> ================================== RESTART ======== ==========================================>> end of reading the file, start import! 50000 reading files takes 0.0 seconds and importing data takes 34.3279998302 seconds! 100000 reading files takes 0.0 seconds and importing data takes 67.3599998951 seconds! 138400 reading files takes 0.0 seconds and importing data takes 73.4379999638 seconds! 138399 data records imported successfully >>>

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.