A three-horse carriage that pulls query performance in a data warehouse

Source: Internet
Author: User
Tags advantage

Objective

In the field of data warehousing, query performance is an important performance indicator for customers, whether in production systems or POC (Proof of Concept) performance tests. Good query performance lays the foundation for the efficient operation of various data warehouse applications. As for query performance, it is well known that the main performance bottleneck is from system I/O, so this paper, from the point of view of physical design of Data Warehouse, expounds three key technologies which affect query performance, and demonstrates the effect of performance improvement with benchmark test tpc-h.

On the partition database, table partitioning and multidimensional Clustering (MDC), DeveloperWorks has many excellent articles on its basic principles and characteristics respectively, this article will not repeat. This article focuses on the impact of the three technologies on query performance in physical design.

Theory

Partitioned databases (database partitioning Feature)

The share-nothing architecture in a partitioned database allocates heavy and time-consuming system I/O jobs to each node in the cluster, combining the SAN (Storage area Network) storage network to take full advantage of the I/O performance of the disk controller and the bandwidth of the storage network.

In order to balance the I/O busy level of each node, the balanced data distribution, it is particularly important. The distribution of data depends on the data itself and the selection of database partitioning keys, and the selection of database partitioning keys should follow the following guidelines:

1. A single, more decentralized column;

2. Columns often used in connection (join);

In the case of a balanced distribution of data, it is possible to avoid a node being overloaded with I/O due to processing too much data, thus becoming the bottleneck of the whole cluster.

In a stand-alone database environment, query processing can only take advantage of the system resources (CPU, MEMORY,I/O), when the data stored in a single Zhang, BI (Business Intelligence) query usually need access to most of the data in the table, such as for the query SQL1, In a stand-alone environment, the physical distribution of the data is shown in Figure 1. In the absence of an index, the database needs to scan the entire large table to query for eligible records, and it is not difficult to imagine the impact of busy I/O on the overall performance of the query as required for scanning large tables.

Listing 1. Query SQL1

1

2

3

SELECT C_name, C_total_spend, c_loyalty_tier from CUSTOMER

where c_region = ' North America ' and C_month= ' March ' and C_type= ' VIPs '

(Note: The blue triangle represents the data that matches the query criteria, i.e. c_region= ' North America ' and C_month= ' March ' and C_type= ' VIPs ')

Figure 1. Physical distribution of data in stand-alone database environment

With a multiple-partition database, the data can be distributed evenly across the nodes in the cluster, although the BI query still needs to scan the entire large table and read most of the data, but the query can be parallel to all nodes, as shown in Figure 2.

Figure 2. Physical distribution of data in partitioned database environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.