Excel a hundred thousand of line data fast Import Database research (go, the next method to see if it is OK)

Source: Internet
Author: User
Tags import database

First paste the original import data code:

8 Import OS Os.environ.setdefault ("Django_settings_module", "www.settings") "DJANGO version greater than or equal to 1.7, need to add the following two lines import Djangodjango.setup () otherwise throws an error django.core.exceptions.AppRegistryNotReady:Models aren ' t loaded yet. Import Djangoif Django. VERSION >= (1, 7): #自动判断版本 Django.setup () from arrears.models import d072qfimport xlrd #excel读工具from datetime import D Atetimefrom xlrd Import xldate_as_tupleimport timeimport random time1 = Time.time () #data = Xlrd.open_workbook (' 11.xlsx ')    Open file with Xlrd.open_workbook (' 11.xlsx ') as Data:print u "read file end, start import!" Time2 = time.time () Table = Data.sheet_by_index (0) #获取工作表 Time3 = Time.time () n=1 x = y = z = 0 worklist =        [] for line in range (n,table.nrows): #nrows = table.nrows #行数 ncols = table.ncols #列数 print sh.row_values (rownum) row = Table.row_values (line) if row: #查看行值是否为空 for i in [0,1,2,4,28,30,32]: if Type (row[ I]) = = Float:row[i] = Int (row[i]) if d072qf.objEcts.filter (Acct_month = row[0],serv_id=row[1]). Exists (): #判断该行值是否在数据库中重复 x = x + 1 #重复值计数 Else:                                   Worklist.append (D072QF (acct_month=row[0],serv_id=row[1],acc_nbr=row[2],user_name=row[3],acct_code=row[4),                                   ACCT_NAME=ROW[5],PRODUCT_NAME=ROW[6],CURRENT_CHARGE=ROW[7],ONE_CHARGE=ROW[8],                                   TWO_CHARGE=ROW[9],THREE_CHARGE=ROW[10],FOUR_CHARGE=ROW[11],FIVE_CHARGE=ROW[12],                                   SIX_CHARGE=ROW[13],SEVEN_CHARGE=ROW[14],EIGHT_CHARGE=ROW[15],NINE_CHARGE=ROW[16],                                   TEN_CHARGE=ROW[17],ELEVEN_CHARGE=ROW[18],TWELVE_CHARGE=ROW[19],ONEYEAR_CHARGE=ROW[20],                                   THREEYEAR_CHARGE=ROW[21],UPTHREEYEAR_CHARGE=ROW[22],ALL_QF=ROW[23],MORETHREE_QF=ROW[24],                                   AGING=ROW[25],SERV_STATE_NAME=ROW[26],MKT_CHNL_NAME=ROW[27],MKT_CHNL_ID=ROW[28], Mkt_region_name=row[29],MKT_REGION_ID=ROW[30],MKT_GRID_NAME=ROW[31],MKT_GRID_ID=ROW[32], prod_addr=row[33]))        y = y + 1 #非重复计数 else:z = z + 1 #空行值计数 n = n + 1            If n% 9999 = = 0:d072qf.objects.bulk_create (worklist) worklist = []            Time.sleep (Random.random ()) #让Cpu随机休息0 <= N < 1.0 s print "Import successful once!"                 print ' data import succeeded, import ' +str (y) + ' bar, repeat ' +str (x) + ' bar, have ' +str (z) + ' behavior empty! ' Time4 = Time.time () print "Read file time" +str (time2-time1) + "seconds, import Data Time" +STR (Time4-time3) + "Seconds!"


This code currently does not all a hundred thousand of rows of data into the database, only spent 1 hours to import 50,000 rows of data into which, after more and more slowly, mainly slow in the Excel table to 70,000 rows of data, reading the data in Excel is very slow, the overall impact on the import speed for several reasons:

1, has been the use of XLRD import xls format file, if the file has a hundred thousand of lines, just read the file will take 200 seconds, if you change to CSV, it hardly takes time

2, this line of code will also affect the speed, especially when the data in the database is large: if D072Qf.objects.filter (Acct_month = row[0],serv_id=row[1]). Exists (): # Determines whether the row value is duplicated in the database

3, if a one-time dictionary add a hundred thousand of rows of data, on the Windows CPU is not suffering! So the recommended 10,000 data import once, empty the list

Post-Improvement Code:

Optimization part: use CSV format; Cancel check duplicate data statement; Import data every 50,000

#coding: Utf-8 import os os.environ.setdefault ("Django_settings_module", "www.settings") ' DJANGO When the version is greater than or equal to 1.7, you need to add the following two clauses import djangodjango.setup () otherwise it will throw an error django.core.exceptions.AppRegistryNotReady:Models aren ' t Loaded yet. " Import Djangoif Django. VERSION >= (1, 7): #自动判断版本 Django.setup () from arrears.models import D072QF import Timeimport randomtime1 = Time.time ( f = open (' 11.csv ') print u "read file end, start import!" Time2 = Time.time () worklist = []next (f) #将文件标记移到下一行y = 0n = 1for in F:row = Line.replace (' "', ') #将字典中的" replace the empty RO w = row.split (';') #按; Slice the string y = y + 1 worklist.append (D072QF (acct_month=row[0],serv_id=row[1],acc_nbr=row[2],user _NAME=ROW[3],ACCT_CODE=ROW[4], acct_name=row[5],product_name=row[6],current_charge=row[7 ],ONE_CHARGE=ROW[8], Two_charge=row[9],three_charge=row[10],four_charge=row[11],five_cha RGE=ROW[12], Six_charge=row[13],seven_charge=row[14],eight_charge=rOW[15],NINE_CHARGE=ROW[16], ten_charge=row[17],eleven_charge=row[18],twelve_charge=row[1 9],ONEYEAR_CHARGE=ROW[20], Threeyear_charge=row[21],upthreeyear_charge=row[22],all_qf=ro W[23],MORETHREE_QF=ROW[24], aging=row[25],serv_state_name=row[26],mkt_chnl_name=row[27], MKT_CHNL_ID=ROW[28], mkt_region_name=row[29],mkt_region_id=row[30],mkt_grid_name=row[31]        , mkt_grid_id=row[32], prod_addr=row[33])) n = n + 1 if N%50000==0:print n D072Qf.objects.bulk_create (worklist) worklist = [] Time3 = Time.time () print "Read file time" +str 2-TIME1) + "seconds, import Data Time" +STR (time3-time2) + "Seconds!" Time3 = Time.time () print nD072Qf.objects.bulk_create (worklist) print "Time to read file" +str (time2-time1) + "seconds, import data time consuming" +STR ( time3-time2) + "Seconds!"  worklist = []print "Successfully imported data" +str (y) + "bar" f.close ()

It was a surprise!!!, only took 73 seconds.

Excel a hundred thousand of line data fast Import Database research (go, the next method to see if it is OK)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.