SQL Server high concurrency and big data storage scenarios
With the increasing of users, the daily activity and the spike of peak, database processing performance is facing a huge challenge. Below you share the database optimization scheme for the actual 100,000 + peak platform. To discuss with you, learn from each other to improve!
Case: Game platform.
1. Resolve High concurrency
When the number of client connections reaches a peak, the service-side maintenance and processing of the connection is not discussed here for the time being. When multiple write requests to the database, it is necessary to insert more than one table, especially some tables to achieve a daily storage of tens of millions, with the accumulation of time, the traditional way of synchronous writing data is obviously not desirable, after testing, through the asynchronous insertion of the way to improve many, but at the same time, The real-time nature of reading data also requires a certain amount of sacrifice.
There are many ways to do it asynchronously, and the current way is to pass the job every once in a while (5min, 10min. See requirements setting) to transfer data from the staging table to the real table.
1. The original table A is also a table that is actually used when reading.
2. Establish the same structure with the original table a B and C, used for the transfer of data processing, synchronization process is c->b->a.
3. Establish the synchronization data of the job Job1 and record the Job1 running state of the table, in the synchronization of the key is to check the current state of JOB1, if the data of B is currently synchronized to a, then the data from the server to C, and then import the data into B, Wait until the next job executes to transfer the data to a. 1:
Figure 1
At the same time, in order to protect the foolproof and easy to troubleshoot problems, you should use a record of the entire database instance of the stored procedures, in a short time to check the results of the job execution, if you encounter an unexpected failure, should be in a timely manner to notify the relevant personnel. such as writing to the e-mail and text message table, let a TCP notification program timed read send and so on.
Note: If the data of the day reaches dozens of G, if there is a query request for this table (the partition will be mentioned below), one of the worst:
B can simultaneously synchronize to multiple servers to share the query pressure, reduce the competition of resources. Because the resources of the entire database are limited, such as insert operations, a shared lock is obtained, then a clustered index is targeted to a row of data and then promoted to an intent lock, and SQL Server's maintenance of the lock depends on the size of the data to request different memory, resulting in competition for resources. Therefore, as far as possible to read and write separate, according to the business model can be divided according to set rules, in the platform of the project should be a priority to ensure that the data effectively inserted.
In the inevitable query big data will certainly consume a lot of resources, such as when encountering bulk deletion, can be replaced by a cycle of batches (such as 2000), so that the process will not cause the entire library to hang up, resulting in some unpredictable bugs. By practice, it is effective and feasible, but the storage space is sacrificed. Can also be based on query requirements to the table data volume of the field split out to the new table, of course, these should be based on each business scenario combined with the requirements to set, design for the appropriate without the need for a gorgeous solution.
2. Troubleshoot storage issues
If the data of a single table reaches dozens of g per day, it is natural to improve the storage solution. Now share their own plans, in the explosion of data under the ravages, still adhere to the line! Now, for example, to share our own environment:
Existing data table A, single table daily new data 30G, in the memory of the use of asynchronous data synchronization, and some can not clear the data table, after the partition can be divided into filegroups, the filegroups are assigned to different disks, reduce the competition of the IO resources, to ensure the normal operation of existing resources. 5 days to retain historical data with demand:
1 You need to create a partitioning scheme from a job job based on the partition function, such as partitioning according to UserID or Time field;
· 2. After partitioning the table, the query can quickly locate a section of the partition through the corresponding index;
3 Transfer the not-partitioned data to the same structure and indexed tables through the job merge partition, and then clear the table's data.
2:
Figure 2
Tracing through SQL query tracking takes a long time to query, as well as the type and granularity of the locks that are present on the current instance, sp_lock or view dm_tran_locks, dblockinfo through SQL's own stored procedures.
Navigate to specific query statements or stored procedures to remedy the situation! Charm
Of course, the benevolent see-_-the beholder.
SQL Server high concurrency and big data storage scenarios