Using partitions in the Microsoft SQL Server 2000 Data Warehouse

Source: Internet
Author: User
Tags microsoft sql server one table requires


Overview



This article discusses the role of data partitioning in data warehouses. Relational data warehouses and Analysis Services cubes support data partitioning. The logical concept of partitioning is the same in the two engines of Microsoft®sql Server™: Horizontal partitioning of data through keys, such as dates. In a relational database, partitions are implemented by creating separate physical tables, such as creating a table for each month's data, and defining a federated view of a member table. Similarly, the profiling services in SQL Server Enterprise Edition support explicit cube partitions. In relational databases and online analytical processing (OLAP) engines, the complexity of physical storage is not visible to users of analysis.



Advantages of Data Warehouse partitioning:



Greatly shorten the query time.



Reduce the load time and improve the maintainability of the database.



Resolves data pruning issues that occur when old data is deleted from the active database.



This technology requires the creation of a data phased application that is more complex than a non-partitioned system. This article describes the best way to design, implement, and maintain a horizontal partitioned data warehouse.



Because effective zoning planning can greatly improve query performance, we strongly recommend that you partition a large analysis service system. Although partitioning a relational data warehouse is an effective solution for some specific data warehouse maintenance issues, you are not generally recommended to do so.



Using partitions in SQL Server 2000 relational data warehouses



Partitioned views join horizontal partitioned data from a group of members so that the data looks like it comes from the same table. SQL Server 2000 distinguishes between local partitioned views and distributed partitioned views. In the local partitioned view, all related tables and views reside on the same instance of SQL Server. In a distributed partitioned view, at least one table in a related table resides on one of the other (remote) servers. It is recommended that you do not use distributed partitioned views for data warehouse applications.



Vector data warehouses are built around facts (scalars) and vectors, and are physically represented as star and snowflake architectures, with very few fully orthogonal planar tables that contain both facts and vectors. Because vector schemas are the most common relational data warehouse structures, this article focuses on the partitioning of such schemas. The following recommendations apply to other common Data warehouse architectures as well.



Advantages of Partitioning



Data pruning



Many data Warehouse administrators regularly archive stale data. For example, a click Stream Data Warehouse might keep detailed data online for only 3-4 months. Other common rules may be to keep online for 13 months, 37 months, or 10, and archive and delete old data from the database when it is not in the active window. This scrolling window structure is a common practice in large data warehouses.



In the absence of partitioned tables, the process of deleting old data from the database requires a large DELETE statement, for example:



DELETE FROM fact_table
WHERE date_key < 19990101



The cost of executing this statement can be very large and may take more time than the load process of the same table. Instead, for partitioned tables, the administrator redefined the UNION all view to exclude the oldest table and then remove the table from the database (assuming that the table was backed up), which can be done almost instantaneously.



We will discuss later that the cost of maintaining partitioned tables is also high. If data pruning is the only reason for partitioning, designers should consider deleting old data in a table that has never been partitioned in the form of data decomposition. Run a script that deletes 1000 rows at a time (with the "SET ROWCOUNT 1000" command) on a low priority process until all the data you want to delete is deleted. The technology can be used effectively on large systems, more directly than creating the necessary zoning management system. Depending on the load and system usage, the technology is appropriate for some systems and should be considered for benchmarking on the system.





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.