The efficiency of a large number of SQLite Inserts

Source: Internet
Author: User

In the past two days, another Data Conversion Program will explain how to convert the data originally stored in binary format to the SQLite database. The total number of records is more than 10 million. The development uses the. net framework 3.5 sp1 environment, so EntityFramework is directly used to facilitate database operations. EntityFramework saves some trouble, but the efficiency is intolerable. It took more than 40 minutes for the entire program to run at a time. Using EntityFramework, we first generate a model from the database and then save the data as follows:

using (GameSetEntities gse = new GameSetEntities(CreateConnection())){    foreach (var ticket in tickets)    {        gse.AddToShuffledTicketSet(ticket);    }    gse.SaveChanges(true);}

Ticket is the record that I want to insert into the database. I divide more than 10 million records into 5000 records for processing. The first 5000 records are added to GameSetEntities (DataContext) and submitted.

With StopWatch, I recorded the time consumed by these 5000 records to be inserted into the database, with an average of 1.1 seconds. (2080*1.1)/60 is about 36 minutes. That is to say, more than 80% of the running time of the entire program is spent on Database insertion. It takes an average of 5000 seconds to insert 1.1 records, which is too slow. I began to doubt whether EntityFramework is efficient? In this case, let's look at it and see if pure ado.net code is faster than it. The following is the manual database operation code:

 

using (DbConnection conn = DbProviderFactories.GetFactory("System.Data.SQLite").CreateConnection())            {                conn.ConnectionString = "Data Source = " + m_datasourceFile + ";Version = 3";                using (SQLiteCommand insertRngCmd = (SQLiteCommand)conn.CreateCommand())                {                    insertRngCmd.CommandText = @"INSERT INTO shuffled_tickets (ticket_idx, seed, win_credits, [timestamp], redeemed, prog_levels)                                                    VALUES  (@ticket_idx, @seed, @win_credits, @timestamp, @redeemed, @prog_levels)";                    conn.Open();                     foreach (var ticket in tickets)                    {                        insertRngCmd.Parameters.AddWithValue("@ticket_idx", ticket.ticket_idx);                        insertRngCmd.Parameters.AddWithValue("@seed", ticket.seed);                        insertRngCmd.Parameters.AddWithValue("@win_credits", ticket.win_credits);                        insertRngCmd.Parameters.AddWithValue("@timestamp", ticket.timestamp);                        insertRngCmd.Parameters.AddWithValue("@redeemed", ticket.redeemed);                        insertRngCmd.Parameters.AddWithValue("@prog_levels", ticket.prog_levels);                         insertRngCmd.ExecuteNonQuery();                    }                }            }

When such code is run, it will take 20 seconds for 5000 records to become slower ...... It seems that the problem is not here.

After Google, I found this article in The sqlite.net ADO provider Forum: fastest bulk inserts. in the last example, "fastest universal way to insert data using standard ADO. net constructs mentioned "100,000 inserts on my machine in 1.4 seconds"-0.1 million records are inserted in 1.4 seconds. It should not take 20 seconds for me to write 5000 messages (it takes 1.1 seconds to use entityframework, and later the analysis should use batch insert operations internally ), it seems that there is a problem with my code writing. After carefully reading the example given by the author, we found that the main difference in our code lies in the use of transactions. The example given by the author uses a transaction to commit 0.1 million records at a time, while my code does not use a transaction, but rather each commit. So I changed my code and added the transaction to commit:

            using (DbConnection conn = DbProviderFactories.GetFactory("System.Data.SQLite").CreateConnection())            {                conn.ConnectionString = "Data Source = " + m_datasourceFile + ";Version = 3";                                using (SQLiteCommand insertRngCmd = (SQLiteCommand)conn.CreateCommand())                {                    insertRngCmd.CommandText = @"INSERT INTO shuffled_tickets (ticket_idx, seed, win_credits, [timestamp], redeemed, prog_levels)                                                    VALUES  (@ticket_idx, @seed, @win_credits, @timestamp, @redeemed, @prog_levels)";                    conn.Open();                    var transaction = conn.BeginTransaction();                       foreach (var ticket in tickets)                    {                        insertRngCmd.Parameters.AddWithValue("@ticket_idx", ticket.ticket_idx);                        insertRngCmd.Parameters.AddWithValue("@seed", ticket.seed);                        insertRngCmd.Parameters.AddWithValue("@win_credits", ticket.win_credits);                        insertRngCmd.Parameters.AddWithValue("@timestamp", ticket.timestamp);                        insertRngCmd.Parameters.AddWithValue("@redeemed", ticket.redeemed);                        insertRngCmd.Parameters.AddWithValue("@prog_levels", ticket.prog_levels);                        insertRngCmd.ExecuteNonQuery();                    }                    transaction.Commit();                }

The insertion time of the 5000 records changes from 1.1 seconds to 0.09 seconds, which greatly improves.

But why is the difference after adding a transaction so big? I flipped through the SQLite document and explained in "database speed comparison:

There is a sentence in "test1: 1000 inserts :...... "In this test, each SQL statement is a separate transaction so the database file must be opened and closed and the cache must be flushed 1000 times "......

There is another sentence under "test insert in a transaction :...... "When all the inserts are put in a transaction, SQLite no longer has to close and reopen the database or invalidate its cache between each statement ."......

From the above two sentences, when there is no transaction, the SQLite insert operation uses too many Io operations, but the transaction only requires one Io.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.