What should I do if SQL server needs to store 10 million pieces of data?

Last Update:2018-12-05 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What should I do if SQL server needs to store 10 million pieces of data?

The company has a large business system with a large amount of data. Each of the major items of data exceeds 10 million records. I have been using sqlserver for a long time. I have never been able to find a good way to efficiently store and process data. If you have any high opinions and experience, please give me some advice! Thank you! (The amount of daily active data goes up to 1 million.) based on the Application Experience in Oracle, more than 2,000,000 records are stored in a table, so partition the data table. If you want to use SQL Server, You can partition data horizontally by month, that is, one table per month. For historical data, you can remove it from the current database disk system and save it to tape, CD, and other media. Of course, it may be difficult to use and maintain data. The suggestions on data storage and I/O are briefly described above.

In my experience, when I was dealing with daily securities trading data, I asked the JOB to create a table every day, each of which stores the data of the day. The daily data volume is also millions of data, no problem. The database record of the maximum data volume is maintained by sqlserver, which is no problem with the data volume. I have compared sqlserver and oracle, 0.2 billion, more than 50 million, and the query speed is similar. Disadvantages: sqlserver is worse than oracle in terms of index maintenance and automatic column addition. oracle can use partitioned tables to simplify index maintenance and improve query speed. sqlserver can also create indexes, however, you can only sort the indexes of the entire table. It is difficult to sort out the indexes. When I set up identity on SQL Server, I did not succeed. It is said that it is abnormal and the machine performance is very good ~~~ Let's do some experiments.

1. Avoid IO conflicts a. Multiple File Groups B. Separate data files (mdf/ndf) and log files (ldf) c. Use raid5 if necessary

2. database optimization. optimize the table structure B. creating a reasonable index for a large table includes the (index/partition) view and c. often perform DB Backup and set the DB mode to simple under certain conditions (if it is a production server, it will not work again :() d. specify a default value for the size of log files and data files. Otherwise, frequent growth may affect the performance and be stored separately.

3. Hardware optimized memory, processor, etc. You can add partitions to improve the performance and ease of maintenance of database partitions. By splitting a large table into a smaller single table, queries that only access a small amount of data can be executed faster, because fewer data needs to be scanned. In addition, you can perform maintenance tasks faster (such as rebuilding indexes or backing up tables ). You can physically place a table on a disk drive without splitting the table. For example, placing a table on a physical drive and placing the relevant table on a separate drive can improve query performance because when you execute a query involving a connection between tables, multiple headers read data at the same time. You can use Microsoft SQL Server 2000 file groups to specify disks on which tables are placed.

Hardware partitioning designs a database to use an available hardware architecture. Examples of hardware partitions include: multi-processor that allows multi-thread execution, so that many queries can be executed at the same time. In other words, you can execute the query components on a multi-processor at the same time, so that a single query is faster. For example, each table referenced in a query can be scanned by different threads at the same time. RAID (Redundant Array of Independent Disks) devices allow data to be striped across multiple disk drives, allowing more read/write heads to read data at the same time, allowing faster data access. Stripe-driven tables in multiple drives are generally faster than scanning the same table stored on one drive. In other words, separating tables from related tables stored on different drives can significantly improve the query performance of those tables. A horizontal partition divides a table into multiple tables. Each table contains the same number of columns and fewer rows. For example, you can partition a table with billions of rows into 12 tables horizontally. Each small table represents the data of one month in a specific year. Any query that requires specific month data only references the table of the corresponding month. How to partition a table horizontally depends on how data is analyzed.

Partition a table to make the query reference as few tables as possible. Otherwise, you must use too many UNION queries to merge tables logically, which weakens the query performance. For more information about querying tables with horizontal partitions, see view usage. A common method is to partition data horizontally based on the period/usage. For example, a table may contain data of the last five years, but only regularly accesses data of the current year. In this case, you can partition the data into five tables, each containing only one year of data. Vertical partitions segment a table into multiple tables. Each table contains fewer columns. Two vertical partitions are normalization and row splitting.

Normalization is a standard database process. This process deletes redundant columns from the table and places them in the secondary table. The secondary table is linked to the primary table based on the relationship between the primary key and the foreign key. Row splitting vertically divides the original table into multiple tables that only contain fewer columns. Each logical row in the split table matches the same logical row in other tables. For example, the row 10 in each split table is joined to recreate the original row. Like horizontal partitions, vertical partitions enable queries to scan less data, thus improving query performance.

For example, if a table contains seven columns and only the first four columns of the table are referenced, you can split the last three columns of the table into a single table. The vertical partition operation should be considered with caution, because the data in multiple partitions needs to be queried in the join table, and if the partition is very large, it may affect the performance. For more information, see use partition view to split data in large tables into smaller Member tables. Partitions data between member tables based on the range of data values in one column. The data range of each member table is defined in the CHECK constraint specified for the partition column.

Define a view to combine the selected Member tables into a single result set using union all. After the SELECT statement that references this view specifies the Search Condition for the partition column, the query optimizer uses the CHECK constraint to define which member table contains the rows. For example, the sales table that records sales in 1998 is partitioned into 12 Member tables, and each month is a member table. Each member TABLE defines constraints in the OrderMonth column: create table May1998Sales (OrderID int primary key, CustomerID int not null, OrderDate datetime null check (DATEPART (yy, OrderDate) = 1998 ), orderMonth int check (OrderMonth = 5), DeliveryDate datetime null, CHECK (DATEPART (mm, OrderDate) = OrderMonth )) the application that fills in May1998Sales must ensure that all rows are 5 in the OrderMonth column and that the order date specifies a date of January 1, May 1998. The constraints defined on this table are enforced. Define a view to use union all to select data from ALL 12 Member tables as a single result set: create view Year1998Sales as select * FROM Jan1998Sales union all select * FROM Feb1998Sales union all select * FROM Mar1998Sales union all select * FROM Apr1998Sales union all select * FROM May1998Sales union all select * FROM Jun1998Sales UNION ALL SELECT * FROM Jul1998Sales union all select * FROM Aug1998Sales union all select * FROM Sep1998Sales Union all select * FROM Oct1998Sales union all select * FROM Nov1998Sales union all select * FROM Dec1998Sales for example, the following SELECT statement queries information for a specific month. SELECT * FROM Year1998Sales WHERE OrderMonth IN (64892) AND CustomerID = the SQL Server Query Optimizer identifies that the search condition IN this SELECT statement only references rows IN the May1998Sales AND Jun1998Sales tables, thus, the search scope is limited to these tables.

The CHECK constraint is not required when the partition view returns the correct results. However, if the CHECK constraint is not defined, the query optimizer must search for all tables instead of only tables that meet the search criteria on the partition column. If the CHECK constraint is not used, the view operates in the same way as any other view with union all. The query optimizer cannot make any assumptions about the values stored in different tables, or skip the search for the tables defined in the view. If all the Member tables referenced by the partition view are on the same server, this view is a local partition view. If the member table is on multiple servers, this view is a distributed partition view. The distributed partition view can be used to process the workload of a database that distributes systems across a group of servers. The partition view makes it easier to maintain the member table independently. For example, at the end of a stage, you can change the partition view definition of the current result to add the latest stage and remove the earliest stage. You can change the partition view definition of the previous results to add the stage just removed from the current result view.

You can also update the previous result view to delete or archive the earliest stages contained in the view. After data is inserted into the partition view, you can use sp_executesql system stored procedures to create an INSERT statement with an execution plan that has a higher chance of reuse in systems with many concurrent users. Create a partition view to horizontally connect partition data in a group member table between one or more servers, making the data look like a table. Microsoft SQL Server 2000 distinguishes between the local partition view and the distributed partition view. In the local partition view, all participating tables and views reside on the same SQL Server instance. In the distributed partition view, at least one participating table resides on different (remote) servers. In addition, SQL Server 2000 distinguishes updatable partition views from views that act as read-only copies of basic tables.

The distributed partition view can be used to implement Database Server consortium. A consortium is a group of separately managed servers, but they collaborate to share the processing load of the system. This mechanism of forming Database Server consortium through partition data enables you to expand a group of servers to support the processing needs of large multi-layer Web sites. For more information, see designing a database federation server. Before implementing the partition view, a horizontal partition table must be created. The original table is divided into several smaller Member tables. Each member table contains the same number of columns as the original table, and each column has the same characteristics (such as data type, size, and sorting rules) as the corresponding columns in the original table ). If you are creating a distributed partition view, each member table is located on a different Member Server. To maximize the location transparency, the names of member databases on each member server should be the same, but this is not required. For example, server1.mermerdb, Server2.CustomerDB, and Server3.CustomerDB. After the member table is designed, each table stores a horizontal area of the original table based on the key value range. The key value range is based on the data value in the partition column. The value range in each member table is forced by the CHECK constraint on the partition column, and the ranges cannot overlap. For example, the value range of a table cannot be set to 1 to 200000, while that of another table ranges from 150000 to 300000, because it is unclear which table contains values between 150000 and 200000. For example, a Customer table is being partitioned into three tables. The CHECK constraints for these tables are: -- On Server1: create table Customer_33 ............

This article is taken from the lingxiao web tribe

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More