In the past two days, another Data Conversion Program will explain how to convert the data originally stored in binary format to the SQLite database. The total number of records is more than 10 million. The development uses the. net framework 3.5 sp1 environment, so EntityFramework is directly used to facilitate database operations. EntityFramework saves some trouble, but the efficiency is intolerable. It took more than 40 minutes for the entire program to run at a time. Using EntityFramework, we first generate a model from the database and then save the data as follows:
using (GameSetEntities gse = new GameSetEntities(CreateConnection())){ foreach (var ticket in tickets) { gse.AddToShuffledTicketSet(ticket); } gse.SaveChanges(true);}
Ticket is the record that I want to insert into the database. I divide more than 10 million records into 5000 records for processing. The first 5000 records are added to GameSetEntities (DataContext) and submitted.
With StopWatch, I recorded the time consumed by these 5000 records to be inserted into the database, with an average of 1.1 seconds. (2080*1.1)/60 is about 36 minutes. That is to say, more than 80% of the running time of the entire program is spent on Database insertion. It takes an average of 5000 seconds to insert 1.1 records, which is too slow. I began to doubt whether EntityFramework is efficient? In this case, let's look at it and see if pure ado.net code is faster than it. The following is the manual database operation code:
using (DbConnection conn = DbProviderFactories.GetFactory("System.Data.SQLite").CreateConnection()) { conn.ConnectionString = "Data Source = " + m_datasourceFile + ";Version = 3"; using (SQLiteCommand insertRngCmd = (SQLiteCommand)conn.CreateCommand()) { insertRngCmd.CommandText = @"INSERT INTO shuffled_tickets (ticket_idx, seed, win_credits, [timestamp], redeemed, prog_levels) VALUES (@ticket_idx, @seed, @win_credits, @timestamp, @redeemed, @prog_levels)"; conn.Open(); foreach (var ticket in tickets) { insertRngCmd.Parameters.AddWithValue("@ticket_idx", ticket.ticket_idx); insertRngCmd.Parameters.AddWithValue("@seed", ticket.seed); insertRngCmd.Parameters.AddWithValue("@win_credits", ticket.win_credits); insertRngCmd.Parameters.AddWithValue("@timestamp", ticket.timestamp); insertRngCmd.Parameters.AddWithValue("@redeemed", ticket.redeemed); insertRngCmd.Parameters.AddWithValue("@prog_levels", ticket.prog_levels); insertRngCmd.ExecuteNonQuery(); } } }
When such code is run, it will take 20 seconds for 5000 records to become slower ...... It seems that the problem is not here.
After Google, I found this article in The sqlite.net ADO provider Forum: fastest bulk inserts. in the last example, "fastest universal way to insert data using standard ADO. net constructs mentioned "100,000 inserts on my machine in 1.4 seconds"-0.1 million records are inserted in 1.4 seconds. It should not take 20 seconds for me to write 5000 messages (it takes 1.1 seconds to use entityframework, and later the analysis should use batch insert operations internally ), it seems that there is a problem with my code writing. After carefully reading the example given by the author, we found that the main difference in our code lies in the use of transactions. The example given by the author uses a transaction to commit 0.1 million records at a time, while my code does not use a transaction, but rather each commit. So I changed my code and added the transaction to commit:
using (DbConnection conn = DbProviderFactories.GetFactory("System.Data.SQLite").CreateConnection()) { conn.ConnectionString = "Data Source = " + m_datasourceFile + ";Version = 3"; using (SQLiteCommand insertRngCmd = (SQLiteCommand)conn.CreateCommand()) { insertRngCmd.CommandText = @"INSERT INTO shuffled_tickets (ticket_idx, seed, win_credits, [timestamp], redeemed, prog_levels) VALUES (@ticket_idx, @seed, @win_credits, @timestamp, @redeemed, @prog_levels)"; conn.Open(); var transaction = conn.BeginTransaction(); foreach (var ticket in tickets) { insertRngCmd.Parameters.AddWithValue("@ticket_idx", ticket.ticket_idx); insertRngCmd.Parameters.AddWithValue("@seed", ticket.seed); insertRngCmd.Parameters.AddWithValue("@win_credits", ticket.win_credits); insertRngCmd.Parameters.AddWithValue("@timestamp", ticket.timestamp); insertRngCmd.Parameters.AddWithValue("@redeemed", ticket.redeemed); insertRngCmd.Parameters.AddWithValue("@prog_levels", ticket.prog_levels); insertRngCmd.ExecuteNonQuery(); } transaction.Commit(); }
The insertion time of the 5000 records changes from 1.1 seconds to 0.09 seconds, which greatly improves.
But why is the difference after adding a transaction so big? I flipped through the SQLite document and explained in "database speed comparison:
There is a sentence in "test1: 1000 inserts :...... "In this test, each SQL statement is a separate transaction so the database file must be opened and closed and the cache must be flushed 1000 times "......
There is another sentence under "test insert in a transaction :...... "When all the inserts are put in a transaction, SQLite no longer has to close and reopen the database or invalidate its cache between each statement ."......
From the above two sentences, when there is no transaction, the SQLite insert operation uses too many Io operations, but the transaction only requires one Io.