DB2 9.5 database partition management and application practices (1)

Source: Internet
Author: User

This article mainly introduces what is DB2 database partition and why database partition is used. The following uses Balanced Warehouse E7100 as an example to introduce the basic methods and application practices of database partition management.

DB2 database Partitioning is provided by the DB2 Enterprise Edition DPF (Data Partitioning Feature) option, which is mainly used to support large-scale Data processing and high concurrency Data access. DB2 database partitions adopt the Share-nothing architecture. databases are decomposed into independent partitions in a non-shared environment. Each partition has its own resources, such as memory, CPU and disk as well as your own data, indexes, configuration files and transaction logs. Database partitions are sometimes called nodes or database nodes. As shown in:

Figure 1. DB2 database partition example

Data can be hashed to different partitions through the Hash algorithm. Each partition is only responsible for processing its own data. After the user sends an SQL operation, the connected partition is called the Coordinate Node, which is responsible for processing user requests, based on the Partition key, the user's request is divided into multiple subtasks and processed in parallel by different partitions. Finally, the execution results of different partitions are summarized and returned to the user. The corresponding partitions are transparent.

In DB2, database partitions can be deployed in clusters or MPP environments, that is, database partitions are distributed on different machines. Database partitions can also be deployed on the same SMP machine, partitions on the same machine are called logical partitions. At the same time, we can deploy multiple partitions in a cluster or MPP environment, and deploy multiple logical partitions on each cluster or MPP node.

DB2 database partitions provide powerful scalability. Because the Share-nothing architecture is adopted, each partition (node) only processes the part of its data, and the partitions are as independent as possible, which reduces the competition for sharing resources between nodes, allows the database to be effectively scaled to support a larger data size and more user access. DB2 database partitions provide scale up and scale out capabilities. Vertical Scaling is achieved by adding physical resources such as cpu, disk, and memory of machines. horizontal scaling is achieved by adding physical machines. DB2 supports up to 1000 partitions. When planning DB2 database partitions, we need to consider whether to expand by adding logical partitions or physical partitions. If a physical machine has multiple CPUs, its physical resources can be shared by multiple partitions. We can add logical partitions to achieve expansion; if the physical resources on a physical machine cannot meet the application requirements, we need to increase the scalability by adding machines, that is, physical partitions.

DB2 database partitions also provide powerful parallel processing capabilities. First, it provides an inter-partition parallelism interval-based parallel mechanism. The hash algorithm is used to divide database requests into multiple tasks for parallel execution in different partitions. At the same time, it provides a parallel mechanism in the intra-partition parallelism partition to divide tasks into different subtasks and execute them in parallel on different CPUs. In addition, we can also use inter-partition parallelism and intra-partition parallelism to achieve full parallel processing capabilities. DB2 database query operations, backup, restore, load, and other utilities and I/O operations can greatly improve their performance through the preceding parallel processing capabilities. As shown in:

Figure 2. Example of parallel processing of DB2 database partitions

Why database partitioning?

Database partitioning brings you the following benefits:

Query scalability

This is one of the main reasons for using database partitions. Dividing a large database into multiple small databases can improve query performance, because each database partition has a small part of its own data. Suppose you want to scan 0.1 billion records. For a single-partition database, this scan requires the Database Manager to scan 0.1 billion records independently. If you make the database system into 50 partitions, and the 0.1 billion records are evenly allocated to the 50 partitions, then the database manager of each database partition will only scan 2 million records.

Architecture restrictions

In DB2 V8 and earlier versions, the maximum number of tables in a non-partitioned database depends on the page size. 4 K pages support a maximum of 64 GB, and 32 K pages support a maximum of 512 GB of data. Table and table space size restrictions are imposed on each partition. Therefore, dividing a database into N partitions can increase the maximum size of a table to N times the maximum size of a single partition table. The memory may also be a limitation, especially in the 32 operating system environment. Because each database partition manages and has its own resources, this restriction can be overcome through the database partition.

Database loading Performance

Database partitions can load data to all database partitions in parallel, greatly reducing the loading time of a single table. This is especially important for systems that require extremely high data loading time, as in real-time business intelligence systems.

Database maintenance performance

Distributing databases to multiple database partition servers can speed up system maintenance, because each operation runs on a subset of data managed by the partition, in this way, the database partition can further reduce the index creation time and the time for collecting statistics, because runstats only runs on one database partition, reducing the reorg time.

Backup/recovery performance

Partitioning a database to a different database server can greatly reduce the database backup time, which is often important to determine whether to use a database partition. DB2 implements parallel backup and recovery operations by allocating independent processes or threads to each tablespace. In the backup of the partitioned database environment, the backup of each partition is independent. By backing up database partitions in parallel, You can greatly reduce the time required to back up the entire database.

Logs

In highly active systems, the performance of database logs may limit the overall throughput of the system. In a partitioned database environment, each partition has its own set of logs. When a large number of insert, update, and delete operations are performed, multiple database partitions can improve the performance, because logs are written in parallel on each database partition, in addition, each single partition requires fewer logs to be recorded.

DB2 provides near-linear scalability as data volumes or processors and partitions increase. However, whether database partitions provide the most benefits depends on the workload of processing, the maximum table size, and other factors.

When to use database partitioning?

The basic principle for designing database partitions is to distribute large tables to all partitions as much as possible to improve the parallel processing capability. Small tables are placed in as few partitions as possible, generally, it is recommended to put it on a single partition; Reduce the communication between partitions as much as possible. In addition to the advantages of partitions mentioned in the previous section, we should also consider whether to adopt database partitions Based on the partition design principles:

An ideal scenario for selecting database partitions is to execute a statement like "select count (*) from big_table. If you place the table on all partitions, each partition can calculate the number of rows on the table, and send the total number (subtotal) to the coordinated partition to calculate the sum, the communication cost here is negligible compared to the work done on each partition.

Another ideal scenario is that a large table is connected to several very small and rarely updated tables. A large table is partitioned, and a small table is copied to each partition, so that the join operation can be performed.

Partitions are not suitable for ad hoc query environments that involve many large tables and a variety of tables and columns during connection. In those cases, it is difficult or impossible to select the partition key of the table, making it impossible for all the large queries to run without many interval communication.

Which of the following statements cannot be used in a single partition. In this case, the overhead of interval-based communication is much higher than the local execution of these statements. If partitions are used (especially across multiple physical systems ), the response time will deteriorate significantly.

Most of the workloads and some specific tasks are in the middle of the two extremes discussed just now, where prototype is needed to study the impact of using partitions.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.