Simple test of hundreds of millions of data records in a single table

Source: Internet
Author: User
This time, we conducted a stress test on mysql for a single table containing hundreds of millions of data records. The relationship between tables is simple. there are only two int fields, user_id and company_id, and indexes are added. The python script randomly inserts 100 million, 500 million, and million-1 E data into the same table, and records the data insertion time. Let's take a look at the data writing: python

This time, we conducted a stress test on mysql for a single table containing hundreds of millions of data records. The relationship between tables is simple. there are only two int fields, user_id and company_id, and indexes are added. The python script randomly inserts 100 million, 500 million, and million-1 E data into the same table, and records the data insertion time. Let's take a look at the data writing: python

This time, we conducted a stress test on mysql for a single table containing hundreds of millions of data records.
The relationship between tables is simple. there are only two int fields, user_id and company_id, and indexes are added.
The python script randomly inserts 100 million, 500 million, and million-1 E data into the same table, and records the data insertion time.
Let's take a look at the data writing:
Empty conversion of python scripts:
Idling 100 W: 0.14 s
Idling 1000 W: 1.74 s
295.11 million data records inserted at a time: s
Insert another 1000 W based on W, round-robin until 1E data is written, record the time consumed by each insert.

As you can see, as the amount of data increases, the time for inserting million pieces of data is still relatively stable, and the ups and downs are not large.

In the end, we are concerned about how fast the query speed can be in a single table with Level B data.

The following describes how to use a python client script to simulate random query of 1E data items. the random query is a python random function. The development machine is public due to limited machine resources,

Therefore, I am not too afraid to take up too much resources for stress testing and use the Python thread module concurrently.

This query test uses three methods:

1. a single process performs a random query of 1000 operations on the database, executes 100 operations, and records the time of each operation.

2. 1000 concurrent threads and 2000 threads perform 1000 random queries on the database, recording each time

3. use mysql official software mysqlslap to operate databases

First:

The results of a single query of 1000 times run 100 times. it is found that the time fluctuation is large, which may be related to the inserted data hash,

There are still a lot of data with the same user_id, between 20-. this data is not very common for online businesses. Therefore, this data does not affect the final result.

Type 2: 1000 concurrent threads perform random 1000 queries on the database,

1000 threads: the slowest time is 8 s, and the processing capability is 125/s;

2000 threads: the slowest time is 10 s, and the processing capability is 100/s;

Method 3: Test mysqlslap

Start 2000 threads and execute SELECT * FROM user_company_test_5000 WHERE user_id = 7432 query

The average processing time is 8.76 s, and 229 queries can be processed per second.

Use the official mysqlslap for testing, which is significantly different from the testing results of python scripts,

There are two reasons for speculation:

1: mysqlslap directly uses socket to connect to Mysql. Therefore, it does not affect the results except mysql processing time and network request time.

2: mysqlslap can only specify SQL statements, but it is not possible to query data randomly. The data in the Test table is dispersed unevenly, which is also a reason.

Data of mysqlslap can only be considered as the best case, but the second python script is closer to the production environment. 1000 queries of data are also random queries,

Therefore, the second method can be used as a basis for the production environment.

Let's take a look at the batch query. the IN statement can contain up to 50 values.

Well, I only opened 200 threads, with the slowest time of 93 s and the fastest time of 46 s. It can be said to be miserable .. For batch query,

Split the statement into multiple statements. If IN is used, services will inevitably be affected.

Conclusion:

After communicating with dba, theoretically it is normal to support 5000 queries per second, but I used mysqlslap to process million data records per table.

In the test, the processing capacity of 2000 clients is only about 700 per second,

From the second data point of view, when the single-host client reaches 2000, it can still process about 100 queries per second.

Source: http://www.imsiren.com/archives/995

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.