[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.

Last Update:2016-08-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First paste the original data import code:

8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qfimport xlrd # excel read tool from datetime import datetimefrom xlrd import xldate_as_tupleimport ti Meimport random time1 = time. time () # data = xlrd.open_workbook('11.xlsx') open the file with xlrd.open_workbook('11.xlsx') as data: print u ". The file is read and imported! "Time2 = time. time () table = data. sheet_by_index (0) # obtain the worksheet time3 = time. time () n = 1 x = y = z = 0 WorkList = [] for line in range (n, table. nrows): # nrows = table. nrows # number of rows ncols = table. ncols # Number of columns print sh. row_values (rownum) row = table. row_values (line) if row: # Check whether the row value is empty for I in [, 32]: if type (row [I]) = float: row [I] = int (row [I]) if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database x = x + 1 # repeated value count else: WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], t En_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = row [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) y = y + 1 # non-repeated count else: z = z + 1 # null Row value count n = n + 1 if n % 9999 = 0: D072Qf. objects. bulk_create (WorkList) WorkList = [] time. sleep (random. random () # Let the Cpu rest randomly 0 <= n <1.0 s print "Import successful once! "Print 'data imported successfully, '+ str (y) +', repeated '+ str (x) +', with '+ str (z) + 'Action blank! 'Time4 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time4-time3) + "seconds!"

Currently, this code does not completely import tens of thousands of rows of data into the database. It takes only one hour to import 50 thousand rows of data into the database, and the subsequent steps become slower and slower, it is mainly slow when the excel table reaches about 70 thousand rows of data, and reading data in excel is slow. There are several reasons that affect the import speed:

1. I have always used xlrd to import xls files. If there are tens of thousands of lines in the file, it will take 200 seconds to read the file. If it is changed to csv, it will take almost no time.

2. This line of statements in the Code also affects the speed, especially when the data in the database is large: if D072Qf. objects. filter (acct_month = row [0], serv_id = row [1]). exists (): # determine whether the row value is repeated in the database

3. If tens of thousands of lines of data are added to the dictionary at a time, the cpu of windows will be overwhelmed! Therefore, we recommend that you clear the list after importing 10 thousand data records.

Improved code:

Optimization part: the csv format is used; check duplicate data statement is canceled; data is imported every 50 thousand

# Coding: UTF-8 import OS. environ. setdefault ("DJANGO_SETTINGS_MODULE", "www. settings ") ''' when the Django version is greater than or equal to 1.7, add the following two statements: import djangodjango. setup () otherwise, the error django will be thrown. core. exceptions. appRegistryNotReady: Models aren't loaded yet. '''import djangoif django. VERSION> = (1, 7): # automatically determines the VERSION of django. setup () from arrears. models import D072Qf import timeimport randomtime1 = time. time () f = open('11.csv ') print u "after reading the file, start Import! "Time2 = time. time () WorkList = [] next (f) # Move the file tag to the next line y = 0n = 1for line in f: row = line. replace ('"','') # replace "in the dictionary with null row = row. split (';') # Press; to segment the string y = y + 1 WorkList. append (D072Qf (acct_month = row [0], serv_id = row [1], acc_nbr = row [2], user_name = row [3], acct_code = row [4], acct_name = row [5], product_name = row [6], current_charge = row [7], one_charge = row [8], two_charge = row [9], three_charge = row [10], four_charge = row [11], Five_charge = row [12], six_charge = row [13], seven_charge = row [14], eight_charge = row [15], nine_charge = row [16], ten_charge = row [17], eleven_charge = row [18], twelve_charge = row [19], oneyear_charge = row [20], threeyear_charge = row [21], upthreeyear_charge = row [22], all_qf = row [23], morethree_qf = row [24], aging = row [25], serv_state_name = row [26], mkt_chnl_name = row [27], mkt_chnl_id = row [28], mkt_region_name = row [29], mkt_region_id = ro W [30], mkt_grid_name = row [31], mkt_grid_id = row [32], prod_addr = row [33]) n = n + 1 if n % 50000 = 0: print n D072Qf. objects. bulk_create (WorkList) WorkList = [] time3 = time. time () print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "Time3 = time. time () print nD072Qf. objects. bulk_create (WorkList) print "file read time consumed" + str (time2-time1) + "seconds, data import time consumed" + str (time3-time2) + "seconds! "WorkList = [] print" imported data "+ str (y) +" bar "f. close ()

The results are amazing !!!, It takes 73 seconds

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32Type "copyright", "credits" or "license () "for more information. >>> ================================== RESTART ======== ==========================================>> end of reading the file, start import! 50000 reading files takes 0.0 seconds and importing data takes 34.3279998302 seconds! 100000 reading files takes 0.0 seconds and importing data takes 67.3599998951 seconds! 138400 reading files takes 0.0 seconds and importing data takes 73.4379999638 seconds! 138399 data records imported successfully >>>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[DJANGO] dozens of thousands of lines of data in excel can be quickly imported into the database, and dozens of thousands of lines of django can be imported.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support