Server|sqlserver| creates this article detailing the process of implementing partitioned tables and the functionality that helps to complete this process. The logical process is as follows:
Figure: Steps to create a partitioned table or index
determine if an object partition should be
Although partitioning can bring many benefits, it also increases the administrative costs and complexity of implementing objects, which may outweigh the gains. In particular, you may not need to partition a table for a smaller table or for tables that currently meet performance and maintenance requirements. The sales scenarios mentioned earlier use partitions to ease the burden of moving rows and data, but when deciding whether to implement partitions, you should consider whether or not your scenario has this burden.
determine partition keys and number of partitions
If you are trying to improve the performance and manageability of large subsets of data, and you have defined access patterns, you can use range partitioning to reduce data contention while reducing the maintenance of read-only data that does not require partitioning. To determine the number of partitions, you should first evaluate the existence of logical groupings and patterns in your data. If you typically only work with only a handful of these set sets at a time, you should define a scope to isolate the query so that it processes only the appropriate data (that is, only specific partitions are processed).
determine if more than one filegroup should be used
To help optimize performance and maintenance, you should use file groups to isolate data. The number of filegroups is partly determined by hardware resources: In general, the number of filegroups is best the same as the number of partitions, and these filegroups are usually located on different disks. However, this applies primarily to systems that intend to analyze the entire dataset. If you have multiple CPU,SQL servers, you can process multiple partitions in parallel, greatly reducing the overall time to handle a large number of complex reports and analyses. In this case, you get the benefits of parallel processing and the migration and removal of partitions in partitioned tables.
Create a file group
If you need to place a partitioned table for multiple files for better I/O balance, you need to create at least one filegroup. A filegroup can consist of one or more files, and each partition must be mapped to a filegroup. A filegroup can be used by multiple partitions, but in order to better manage the data (for example, for more precise backup control), you should design the partition table so that only the data or logical groupings of relevant data are in the same filegroup. Using ALTER DATABASE, you can add a logical filegroup name and then add a file. To create a filegroup named 2003Q3 for the AdventureWorks database, use ALTER DATABASE as follows:
ALTER DATABASE AdventureWorks ADD FILEGROUP [2003Q3]
After the filegroup is created, use ALTER DATABASE to add the file to the filegroup.
ALTER DATABASE AdventureWorks
ADD FILE
(NAME = N ' 2003q3 ',
FILENAME = N ' C:\AdventureWorks\2003Q3.ndf ',
SIZE = 5MB,
MAXSIZE = 100MB,
FileGrowth = 5MB)
To FILEGROUP [2003Q3]
You can create a table for a file by specifying a filegroup in the ON clause of the CREATE table. However, you cannot create a table for more than one filegroup if the table is not partitioned. To create a table for a filegroup, use the ON clause of the CREATE table. To create a partitioned table, you must first determine the functional mechanism of the partition. The criteria for partitioning are logically separated from the table in the form of partition functions. This partition function exists as a table-independent definition, and this physical separation will help because it can be used by multiple objects. Therefore, the first step in partitioning a table is to create a partition function.
to create a partition function for a range partition
A range partition must be defined using boundary conditions. Also, even if a table is restricted by a CHECK constraint, the value of any boundary of that range cannot be eliminated. To allow data to be moved to the table on a regular basis, you need to create the last empty partition.
In a range partition, you first define a boundary point: if there are five partitions, define four boundary point values and specify whether each value is the upper boundary (left) of the first partition or the bottom boundary (right) of the second partition. There is always an empty partition, depending on left or right, because the partition does not have a well-defined boundary point.
Specifically, if the first value (or boundary condition) of the partition function is ' 20001001 ', the value in the boundary partition will be:
For left
The first partition is all data less than or equal to ' 20001001 '
The second partition is all data greater than ' 20001001 '
For right
The first partition is all data less than ' 20001001 '
The second partition is all data greater than or equal to ' 20001001 '
Because the scope partition may be defined in datetime data, you must understand its meaning. Using datetime has some meaning: you always specify both a date and a time. A date that does not define a time value represents a 12:00 a.m with a time section of "0". If left is used in conjunction with such data, the date is October 1 12:00 a.m. Data will be in the first partition, while other data for October will be in the second partition. Logically, it is best to use the start value in conjunction with right, and the end value with left. The following three clauses create a logically identical partition structure:
RANGE left for VALUES (' 20000930 23:59:59.997 ',
' 20001231 23:59:59.997 ',
' 20010331 23:59:59.997 ',
' 20010630 23:59:59.997 ')
Or
RANGE right for VALUES (' 20001001 00:00:00.000 ', ' 20010101 00:00:00.000 ', ' 20010401 00:00:00.000 ', ' 20010701 00:00:00.000 ')
Or
RANGE right for VALUES (' 20001001 ', ' 20010101 ', ' 20010401 ', ' 20010701 ')
Attention: The use of datetime data types here does add some complexity, but you need to ensure that the correct boundary conditions are set. Note the simplicity of using right because the default time is 12:00:00.000 a.m. For left, the increase in complexity is due to the precision of the datetime data type. 23:59:59.997 must be selected because datetime data does not guarantee millisecond-level precision. In contrast, the accuracy of datetime data is within 3.33 milliseconds. The exact time value of using 23:59:59.999 is not possible because the value is rounded to the nearest time value, the 12:00:00.000 a.m of the next day. Because of this rounding, boundaries will not be defined correctly. For datetime data, you must be extra cautious about the millisecond values that are explicitly provided.
Attention: Partition functions also allow functions to be part of the definition of partition functions. You can use DATEADD (ms,-3, ' 20010101 ') instead of using ' 20001231 23:59:59.997 ' to define the time explicitly.
To store One-fourth of the Orders data in four active partitions (one calendar quarter for each partition), and create a fifth partition for future use (or as a placeholder for moving and moving data in a partitioned table), use the left partition function in conjunction with the following four boundary conditions:
CREATE PARTITION FUNCTION ORDERDATERANGEPFN (datetime)
As
RANGE left for VALUES (' 20000930 23:59:59.997 ',
' 20001231 23:59:59.997 ',
' 20010331 23:59:59.997 ',
' 20010630 23:59:59.997 ')
Remember that defining four boundary points creates five partitions. Check the dataset created by this partition by viewing the following dataset:
Boundary point ' 20000930 23:59:59.997 ' as left (set mode):
The leftmost partition will contain all values less than or equal to ' 20000930 23:59:59.997 '
Boundary point ' 20001231 23:59:59.997 ':
The second partition will contain all values greater than ' 20000930 23:59:59.997 ' but less than or equal to ' 20001231 23:59:59.997 '
Boundary point ' 20010331 23:59:59.997 ':
The third partition will contain all values greater than ' 20001231 23:59:59.997 ' but less than or equal to ' 20010331 23:59:59.997 '
Boundary point ' 20010630 23:59:59.997 ':
The fourth partition will contain all values greater than ' 20010331 23:59:59.997 ' but less than or equal to ' 20010630 23:59:59.997 '
Finally, the fifth partition will contain all values greater than ' 20010630 23:59:59.997 '.
Create a partition schema
After you create a partition function, you must associate it with the partition schema to direct the partition to a specific filegroup. When you define a partition schema, you must specify a filegroup for each partition, even if multiple partitions are in the same filegroup. For the previously created range partition (ORDERDATERANGEPFN), there are five partitions, and the last empty partition is created in the PRIMARY filegroup. Because this partition will never contain data, you do not need to specify a special location.
CREATE PARTITION SCHEME Orderdatepscheme
As
PARTITION ORDERDATERANGEPFN
to ([2000q3], [2000q4], [2001q1], [2001Q2], [PRIMARY])
Attention: If all partitions are in the same filegroup, you can use the following simpler syntax:
CREATE PARTITION SCHEME Orderdatepscheme
As
PARTITION ORDERDATERANGEPFN
All to ([PRIMARY])
Create a partition table
Once you have defined partition functions (logical structures) and partition schemas (physical structures), you can create tables to take advantage of them. The table defines the schema that should be used, and the schema defines the function. To combine these three, you must specify the columns where the partition function should be applied. A range partition is always mapped to only one column in a table, which should match the data type of the boundary condition defined in the partition function. Also, you should add a CHECK constraint if the table should explicitly restrict the dataset (rather than from negative infinity to positive infinity).
CREATE TABLE [dbo]. [Ordersrange]
(
[PurchaseOrderID] [INT] Not NULL,
[EmployeeID] [INT] Null
[VendorID] [INT] Null
[Taxamt] [Money] Null
[Freight] [Money] Null
[SubTotal] [Money] Null
[Status] [tinyint] Not NULL,
[RevisionNumber] [tinyint] Null
[ModifiedDate] [DateTime] Null
[Shipmethodid] [tinyint] Null
[ShipDate] [DateTime] Not NULL,
[OrderDate] [DateTime] Not NULL
CONSTRAINT Ordersrangeyear
CHECK ([OrderDate] >= ' 20030701 '
and [OrderDate] <= ' 20040630 11:59:59.997 '),
[TotalDue] [Money] Null
)
On Orderdatepscheme (OrderDate)
Go
Indexing: Partitioning?
By default, indexes created in partitioned tables also use the same partition schema and partition columns. If this is the case, the index is aligned with the table. Although not required, aligning a table with its index can make management easier, especially for sliding window scenarios.
For example, to create a unique index, the partitioning column must be a key column, which ensures that the appropriate partitions are validated to ensure the uniqueness of the index. Therefore, if you need to partition a table on one column, and you must create a unique index on another column, the tables and indexes will not be aligned. In this case, you can partition the index in a unique column (if it is a unique key for multiple columns, or it can be any key column), or you cannot partition it at all. Note that when you move and move data in a partitioned table, you must delete and create the index.
Attention: If you plan to load tables with existing data and immediately add indexes to them, you can usually get better performance by loading into an unpartitioned, indexed table, and then creating a partitioned index after the data is loaded. By defining the cluster index for the partition schema, you can partition the table more efficiently after the data is loaded. This is also a good way to partition an existing table. To create the same table as the unpartitioned table and to create the same cluster index as the partitioned cluster index, replace the ON clause in the creation table with a filegroup target location. The cluster index is then created for the partition schema after the data is loaded.