Test the MySQL Group's concurrent access solution with hundreds of millions of data records

Last Update:2017-01-13 Source: Internet

Author: User

Tags http post unique id intel core i5

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The test process is as follows: MySQL is tested from 10, 20, 50, and 100 threads in four application scenarios. The test results show that: for scenario 1) general concurrent access can meet the requirements; for scenario 2) and 3) response time in minutes, respectively at 1-3 minutes and 10 minutes; scenario 4) An exception is often thrown, and the response time is about 30 minutes based on the exception point.

Test Environment

Hardware environment:

Localhost: CPU: Intel Core I5, clock speed: 3.10 GB, Memory: 4 GB

MySQL Cluster: 9 servers

Software environment:

Localhost: Win7, jdk 1.8

MySQL Cluster: MySQL5.6.25 (Community Edition)

Data scale:

Data entry: a month's stock data, with more than 0.2 billion million records. The table structure is about 50 fields. For details, see the table structure below.

Table structure:

Drop table if exists 'taq _ 201504 ';
Create table 'taq _ 201504 '(
'Seccode' varchar (6) not null,
'Seccname' varchar (20) not null,
'Tdate' varchar (10) not null,
'Ttime' varchar (6) not null,
'Lastclose' decimal (19,3) default null,
'OP' decimal (19,3) default null,
'CP' decimal (19,3) default null,
'Tq' decimal (19,3) default null,
'TM 'decimal (19,3) default null,
'TT' decimal (18, 0) default null,
'CQ 'decimal (18, 0) default null,
'CM 'decimal (19,3) default null,
'CT 'decimal (19,3) default null,
'Hip 'decimal (19,3) default null,
'Lop' decimal (19,3) default null,
'Syl1' decimal (19,3) default null,
'Syl2' decimal (19,3) default null,
'Rf1' decimal (19,3) default null,
'Rf2' decimal (19,3) default null,
'BS 'varchar (18) default null,
'S5 'decimal (19,3) default null,
'S4 'decimal (19,3) default null,
'S3' decimal (19,3) default null,
'S2' decimal (19,3) default null,
'S1 'decimal (19,3) default null,
'B1 'decimal (19,3) default null,
'B2' decimal (19,3) default null,
'B3' decimal (19,3) default null,
'B4 'decimal (19,3) default null,
'B5 'decimal (19,3) default null,
'Sv5' decimal (20, 0) default null,
'Sv4' decimal (20, 0) default null,
'Sv3' decimal (15, 0) default null,
'Sv2' decimal (15, 0) default null,
'Sv1' decimal (15, 0) default null,
'Bv1 'decimal (15, 0) default null,
'Bv2' decimal (15, 0) default null,
'Bv3 'decimal (15, 0) default null,
'Bv4 'decimal (15, 0) default null,
'Bv5 'decimal (15, 0) default null,
'Bsratio 'decimal (19,3) default null,
'Spd 'decimal (19,3) default null,
'Rpd 'decimal (19,3) default null,
'Depth1' decimal (20, 3) default null,
'Depth2' decimal (20, 3) default null,
'Unix 'bigint (20) default null,
'Market' varchar (4) default null,
KEY 'seccode' ('seccode', 'tdate', 'ttime ')
)/*! 50100 TABLESPACE ts_cloudstore storage disk */ENGINE = ndbcluster default charset = utf8;

Table pai_201504

Note: 'seccode', 'tdate', and 'ttime' are composite indexes stored in memory.

Performance testing

In this article, we will test MySQL from 10, 20, 50, and 100 threads in four application scenarios.

1) scenario 1

Query access to services on a certain day, that is, interactive access by multiple threads is as follows:

Select CP from pai_201504 where SECCODE = '20160301' and tdate = '20160301 ';

That is to say, each five threads execute one query. Taking 10 threads as an example, two threads will execute one of the statements. Other threads are the same as each other and will not go into details. The test results are shown in table-1.

The result of table-1 shows that in the case of 20-50 concurrent threads, 1-3 fields are queried by day, and the data response time is about 3 s. However, in the case of a large number of concurrent threads (100 threads), the time required for data display increases significantly. It should be pointed out that although the test can reflect some problems, the time required to increase switching between multiple threads is also the cause of the increase in latency.

2) scenario 2

Return query results in batches for the range of a field, that is, concurrent access by multiple threads is as follows:

Select CP from pai_201504 where SECCODE in ('20160301', '20160301', '20160301', '20160301', '20160301', '20160301 ')

The test results are shown in table-2.

TABLE-2 shows that the testing time has reached the minute level. In addition, for abnormal situations in the table, the crux of the problem is that multi-threaded testing is used on a single machine, so limited by the storage space of the machine tested in this article. The author believes that the limit of the MySQL database is not reached.

3) scenario 3

Specify a specific field to query the entire table, that is, multi-threaded concurrent access is as follows:

Select cp from pai_201504 where ttime = '201312'

The test results are shown in table-3. It should be noted that this article did not test 50 or 100 threads, but it was not started because it took too long.

The results of table-3 show that the processing time of fields increases gradually with the increase of the number of fields, which has reached the minute level and basically reached the 10-minute level.

4) scenario 4

Specify specific fields to query the entire table, that is, multi-threaded concurrent access is as follows:

Select cp from pai_201504 where tdate = '20160301' and ttime = '20160301'

The test results are shown in table-4.

Table-4 shows that specifying multiple fields in a query increases the query time. It should be noted that because the preceding SQL statement takes a lot of time to scan the database during query, Got error 4008 'receive from NDB failed' from NDBCLUSTER. Therefore, table-4 is in red, the exception is thrown too early, so there is a deviation in the statistical time. The author believes that, because each query takes about half an hour, data access times out, which causes the above exception to a large extent. In addition, the database is accessed concurrently on a single machine, the time of inter-thread switching will also have a certain impact on it.

Summary

From the test results in the above 4 scenarios, when the returned data volume is relatively small, multi-threaded access to MySQL can meet user needs. When the accessed data volume is large, multi-threaded access can meet the connection requirements. However, when presented to users, the processing time is several minutes. The more fields and data are returned, the more time is consumed.

High-concurrency generic search engine architecture design for hundreds of millions of data

Recently, I have designed the following latest search engine architecture. Now I have written a beta version of "search query interface" and "index update interface. According to the test, a general pc with a 4-3.6 GHz dual-core CPU and 2 GB memory is equipped with 70 million Index records, the average query speed of the "search query interface" is 0.0XX seconds (the query speed has reached the level of search engines such as Baidu, Google, Sogou, and Yahoo China. For details, refer to "Appendix 2" at the end of the article "), it also supports up to 5000 concurrent connections, while the index update interface performs data analysis, enters the queue, and returns information to the user throughout the entire process, up to 1500 Requests/Sec.

The "queue controller" is the core. It needs to control the reading of queues, update the MySQL master table and incremental table, update the data storage layer of the search engine, Tokyo Tyrant, and quasi-real-time (within 1 minute) update the incremental index of Sphinx and merge the indexes of Sphinx regularly. I expect to write a beta version this week.

Illustration:

1. Search and query interface:

① The Web application server uses http post/GET to pass the search keyword and other conditions to the search. php interface of the search engine server;

② ③ Search. php uses the Sphinx API (based on the latest Sphinx 0.9.9-rc1 API, I changed a C language PHP extension sphinx. so), query the sphsf-index service, and obtain the unique ID of the search engine that meets the query conditions (15-bit search unique ID: First 5 Category ID + last 10 original data table primary key ID) list;

④ Search. php uses these IDCs as keys and uses Memcache protocol to retrieve the text data corresponding to IDCs from mget in Tokyo Tyrant at a time.

7. search. php highlights the search result set by query conditions and returns the result to the Web application server in JSON or XML format.

2. Index update interface:

(1) the Web application server uses http post/GET to notify the search server of the update. php interface of the content to be added, deleted, or updated;

(2) update. php writes the received information to the TT high-speed queue (a queue system based on Tokyo Tyrant );

Note: the speed of these two steps can exceed 1500 requests/second, and the search index update call of 60 million PV can be responded.

3. Search indexes and data storage control:

(I) the "queue controller" daemon reads information cyclically from the TT high-speed queue (50 messages each time until the end );

(Ii) the "queue controller" writes the read information to the data storage layer of the search engine, Tokyo Tyrant;

(3) the "queue controller" asynchronously writes the read information to the MySQL master table (this master table is partitioned by 5 million records and used only for permanent data backup );

(4) the "queue controller" writes read information to the MySQL incremental table;

(V) in one minute, the "queue controller" triggers Sphinx to update the incremental index. The indexer of Sphinx will use the MySQL incremental table as the data source to create the incremental index. The incremental index of sphenders corresponds to the MySQL incremental table used as the data source;

(6) the "queue controller" stops reading information from the TT high-speed queue for a short time every 3 hours, and triggers sphenders to merge incremental indexes into the primary Index (this process is very fast ), at the same time, the MySQL incremental table is cleared (this ensures that the number of records in the MySQL incremental table is always from several thousand to several 100,000, greatly accelerating the update speed of the Sphinx incremental index ), then retrieve data from the TT high-speed queue and write the data to the MySQL incremental table.

Open source software used in this architecture:

1. Sph00000.9.9-rc1

2. Tokyo Tyrant 1.1.9

3. MySQL 5.1.30

4. Nginx 0.7.22

5. PHP 5.2.6

Self-developed procedures for this architecture:

1. search query interface (search. php)

2. Index update interface (update. php)

3. Queue controller

4. Sph1_0.9.9-PHP extension of rc1 API (sph1_. so)

5. High-speed queue system based on Tokyo Tyrant

Appendix 1: Comparison results of MySQL FullText, Lucene search, and Sphinx search by third parties:

1. Query speed:

MySQL FullText is the slowest, Lucene and Sphinx query speed is equal, and Sphinx is slightly dominant.

2. Indexing speed:

The index creation speed of Sphinx is the fastest, 9 times faster than Lucene. Therefore, sphtracing is very suitable for quasi-real-time search engines.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More