Use partitions in SQL Server 2000 Data Warehouse

Source: Internet
Author: User

The partition view joins the horizontal partition data from a group of Members to make the data look like it is from the same table. SQL Server 2000 distinguishes the local partition view from the distributed partition view. In the local partition view, all related tables and views reside on the same instance of SQL Server. In the distributed partition view, at least one table in the relevant table resides on another (remote) server. We recommend that you do not use the distributed partition view for Data Warehouse applications.

Vector Data Warehouse is built around fact (scalar) and vector. It is physically represented as a star architecture and a snowflake architecture. There are very few completely non-orthogonal flat tables that contain facts and vectors at the same time. As vector architecture is the most common relational data warehouse structure, This article focuses on the partitions of such architectures. The following suggestions also apply to other general data warehouse architectures.

Advantages of partitioning

Data pruning:

Many data warehouse administrators regularly archive outdated data. For example, a clickstream data warehouse may only keep detailed data online for three to four months. Other common rules may be to keep the data online for 13 months, 37 months, or 10 years. When the old data is not in the activity window, it is archived and deleted from the database. This type of tumble window structure is a common practice of big data warehouses.

If no partition table exists, a large DELETE statement is required for the process of deleting old data from the database. For example:

       
        DELETE FROM fact_tableWHERE date_key < 19990101
       

The overhead of executing this statement is very large, which may take more time than the loading process of the same table. On the contrary, for a partition table, the Administrator re-defines the union all view to exclude the oldest table and then deletes the table from the database (assuming that the table is backed up ), this process can be completed almost instantly.

We will discuss later that the cost for maintaining partition tables is also high. If data pruning is the only reason for using partitions, the designer should consider deleting old data in a table that has never been partitioned by means of data decomposition. Run a script to delete 1000 rows (run the "set rowcount 1000" command) consecutively in a low-priority process until all the data to be deleted is deleted. This technology can be effectively used in large systems and is more direct than creating necessary partition management systems. Based on the load volume and system usage, this technology is suitable for some systems and should be considered for benchmarking on the system.

Loading Speed:

The fastest way to load data is to load data to empty tables or tables without indexes. By loading to a small partition table, the efficiency of the gradient loading process will be greatly improved.

Maintainability:

Once a partitioned data warehouse has been built, the entire system will become easy to maintain. Maintenance activities (including loading data, backing up and restoring tables) can be executed in parallel, which can greatly improve performance. The process of gradually filling the multi-dimensional dataset of the downstream data stream can be accelerated and simplified.

Query speed:

The query speed should not be used as a reason to partition a data warehouse relational database. The query performance of fact tables with and without partitions is similar. In a correctly designed partitioned database, the relational engine only includes the partitions required for parsing the query in the query plan. For example, if the database is partitioned by month and the query condition is January 2000, the query plan only includes partitions in January 2000. The query results are executed correctly on the partitioned table, which is roughly the same as the partitioned table with the clustered index on the partition key.

Disadvantages of partitioning

Complexity:

The main disadvantage of partitioning is that the administrator needs to create an application to manage partitions. Before you design, test, and test run applications to manage partitions, it is inappropriate to put a data warehouse that uses horizontal partitions in a relational database into formal operation. One of the purposes of this article is to discuss issues related to partition management applications and design decisions.

Query Design constraints:

To achieve optimal query performance, all queries should put the conditions directly on the filter key in the fact table. Queries that place constraints on the second table (for example, a table with a date vector) will include all partitions.

Factors to consider during design:

Vector Data Warehouse is built around fact (scalar) and vector. It is physically represented as a star architecture and a snowflake architecture. There are very few completely non-orthogonal flat tables that contain facts and vectors at the same time. In typical cases, the Vector Data Warehouse administrator only partitions fact tables, which has almost no benefit in partitioning vector tables. In some cases, partitioning a large vector table that contains more than 10 million members may be advantageous. You can also partition a non-vector relational data warehouse. The general point of view in this article is still applicable.

Effective partition plans can be formulated only when the system architecture and design objectives are fully taken into account. Even with the same architecture design, a relational data warehouse that is only used to fill the service analysis multi-dimensional dataset may adopt a partition structure different from the data warehouse directly queried by analysts. The system with a rolling window must be partitioned by time, while other systems are not necessarily.

If the data warehouse includes an analytic service multi-dimensional dataset, Microsoft recommends that the partitions in the relational data warehouse and analytic Service database should be in a parallel structure. The maintenance application is simplified: The application creates a new table in the relational database while creating a new multi-dimensional dataset partition. The Administrator only needs to master one partition policy. However, one application may also have a good reason to partition two databases in different ways. The only reduction is the complexity of database maintenance applications.

Partition design overview

Partition tables in the SQL Server database can use the partition view that can be updated or queried (cannot be updated. In both cases, table partitions are created by the CHECK constraint that each partition contains the correct data. An updatable partition view supports INSERT (UPDATE or DELETE) operations on the view, and pushes the operation to the correct basic table. This is helpful, but data warehouse applications usually need to Perform Batch loading, which cannot be executed through views. The following table summarizes the requirements, advantages, and disadvantages of updatable and queryable partition views.

Microsoft recommends that you define the primary key and design the fact table as a local partition Union view (on a single server. In most cases, this definition generates updatable partition views, but the data warehouse maintenance application should be designed to directly load most data to Member tables in batches (instead of through views ).

  • 1
  • 2
  • 3
  • 4
  • Next Page
[Content navigation]
Page 1st: Use partitions in SQL Server 2000 Data Warehouse Page 2nd: Use partitions in SQL Server 2000 Data Warehouse
Page 3rd: Use partitions in SQL Server 2000 Data Warehouse Page 4th: Use partitions in SQL Server 2000 Data Warehouse

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.