Analysis of DB2 partition database

Last Update:2018-03-01 Source: Internet

Author: User

Tags db2

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Concept Description

DB2 the database partition is provided by the DB2 Enterprise version of the DPF (Data partitioning Feature) option, which is used primarily for partitioning (logical or physical) to distribute large databases that provide the necessary scalability and utilizes a non-shared (shared-nothing) structure. Databases are decomposed into separate partitions in a non-shared environment, each with its own resources, such as memory, CPU and disk, and its own data, indexes, configuration files, and transaction logs. Database partitions are sometimes referred to as nodes or database nodes. With DPF "divide-and-conquer" processing, scalability can be enhanced in a single server (scale-up) or cross-server cluster (scale-out).

One of the most obvious reasons to use DPF is to improve the performance of query workloads and insert/update/delete operations. DPF can also overcome the architectural limitations of some DB2. For example, in DB2, for a page size of 4 KB, the maximum size of a table is 8 GB, and the maximum size of a table is one GB for the size of a page, and for a page size of KB, the maximum size of the table is one GB; for a page size of KB, the table's The maximum size is GB. In DB2, the size limit for table and table spaces is defined by each partition. Partitioning the database across multiple partitions will allow you to increase the maximum size of the table based on the number of partitions in your environment.

DB2 database Partition instance diagram:

2. The effect of DPF on database performance

the data is through a hash algorithm are allowed to be hashed into different partitions, each partition is only responsible for processing its own data. After the user issues a SQL operation, the connected partition is called coordinate Node, which handles the user's request and decomposes the user's request into multiple subtasks according to the Partition key (partition key) to be processed in parallel by different partitions. Finally, the execution results of different partitions are aggregated back to the user, and the partition is transparent to the application.

In DB2, a database partition can be deployed on a cluster or MPP (multiple single CPU machines, a DB2 instance with multiple partition, 1 partition on each machine), which means that the database partitions are distributed across different machines The database partition can also be deployed on the same SMP (a machine with multiple CPUs, a DB2 instance with multiple partitions, where the number of partitions does not exceed the number of existing CPUs) on the machine, partitions on the same machine are called logical partitions. At the same time, we can deploy multiple partitions in a clustered or MPP environment and deploy multiple logical partitions on each node of the cluster or MPP.

There are several benefits of using database partitioning, which are briefly described below:

Query extensibility

This is one of the most important reasons for using database partitioning. Splitting a large database into smaller databases can improve the performance of queries because each database partition has its own subset of data. Suppose now scan 100,000 records, for a single partition of the database, the scan operation requires the database Manager to scan 100,000 records independently, if the database system into 10 partitions, and the average allocation of these 100,000 records on the 10 partitions, Then the database manager for each database partition scans only 100,000 records.

Schema Limitations

The largest table for a non-partitioned database depends on the page size, and 4K pages support up to gb,32k pages with a maximum support of up to GB of data. Table and table space size limits are limits on each partition, so dividing the database into N partitions increases the maximum size of the table to n times the maximum size of a single partitioned table. Memory can also be a limitation, especially in 32-bit operating system environments, because each database partition manages and has its own resources, so this limitation can be overcome through database partitioning.

Database load Performance

Database partitioning can load data in parallel to all database partitions, greatly reducing the load time of single tables, which is especially important for systems like real-time business intelligence systems that require particularly high data loading times.

Database Maintenance Performance

Spreading the database across multiple database partition servers can speed up system maintenance because each operation runs on a subset of the data managed by the partition, which further reduces the time to create the index through the database partition and reduces the time to collect statistics because runstats only runs on one database partition , reducing the time of Table reorganization (reorg).

Backup/Restore Performance

Partitioning a database onto a different database server can significantly reduce the time it takes to make a database backup, which is important to decide whether to use the database partition. DB2 enables parallel processing of backup and restore operations by assigning separate processes or threads to each table space. In a backup of a partitioned database environment, backups for each partition are independent, and backing up the database partition in parallel can significantly reduce the time to back up the entire database.

Log

In highly active systems, the performance of the database log may limit the overall throughput of the system. In a partitioned database environment, each partition has its own set of logs. When a large number of INSERT, UPDATE, and delete operations, multiple database partitions can improve performance because the logs are written in parallel on each database partition, and fewer logs are required per partition.

DB2 with the increase in data volumes or processors and partitions, you can provide near-linear scalability, but whether the database partition provides the most benefit depends on the workload being processed, the size of the largest table, and other factors. At present, the data warehouse of our project also uses the database partition, because the data volume is large, and the business to the CPU demand is also relatively big, but the machine is older, the stand-alone cannot increase the CPU, and the one instance realizes several partitions, therefore uses the other kind of SMP Cluster (multiple machines with multiple CPUs, one set of DB2 Instance with multiple partition, with more than one partition on each machine)

3. Comparison of DB2 partitions with Oracle

Oracle Partitioning	DB2 partition	Oracle 10g syntax	DB2 V9 syntax
interval partition (range partitioning)	Table Zoning (Table partitioning)	PARTITION by RANGE	PARTITION by RANGE
Hash partition (hash partitioning)	database partition (db partitioning)	PARTITION by HASH	Distribute by HASH
List partition (partitioning)	With build list partition (Table partitioning with Generated Column)	PARTITION by LIST	PARTITION by RANGE
Not supported	Multidimensional Clustering (Multidimensional clustering)	No	ORGANIZE by DIMENSION

In the case of database partitioning, the following is a comparison of the DB2 database partition with the Oracle Hash partitioning feature:

DB2 partition

TD style= "Border-color:windowtext windowtext windowtext CurrentColor; Border-style:solid solid Solid none; border-width:1px 1px 1px medium; Border-image:none 100%/1/0 stretch; -moz-border-top-colors:none; -moz-border-left-colors:none; -moz-border-bottom-colors:none; -moz-border-right-colors:none; padding:0px 7px; "width=" 234 "valign=" top ">

	Oracle partition
Partition schema	share-nothing	Share-disk
Partitioning properties	Each CPU has a private memory area and private disk space, and two CPUs cannot access the same disk space, and the communication between the CPUs is connected over the network.	Each CPU uses its own private memory area and accesses all disk systems directly through the internal communication mechanism.
difference	Scalability--increasing the physical nature of the database increases the computational resource (that is, the database partition)	cannot be increased by adding physical partition to the database
Statement example	Partition_tablename Table select Partition_ ID field as partition key	Hash_tablename tables are hashed by Hash_part field, and each partition is placed in the tablespace Tbsp1 and tbsp2 in a circular manner.
Statement example	create table Partition_tablename (partition_id ID Not null partition_id varchar ( 20 not null ) in Tbsp_parts distribute by HASH (partition_id);	CREATE TABLE Hash_tablename (Hash_part ID, hash_id varchar2()) PARTITION by HASH (hash_id) (Partition P1 tablespace tbsp1, partition P2 tablespace tbsp2);

4. Summary

The purpose of this sharing is to briefly introduce the concept of DB2 partitioning and compare it with Oracle's partitioning. Partitioned databases facilitate the performance of query workloads and DML operations. If the amount of data is small, the performance improvement is not very obvious, so the partitioned database is generally used in a large amount of data, query the requirements of the database more frequently. In fact, the use of Oracle good or DB2 good and there is no absolute choice. For example, Oracle employs a fully open strategy that enables customers to choose the most appropriate solution for their developers, and DB2 is best suited for massive data, and is the most widely used enterprise-wide, scalable and parallel. Like Guangdong's old fire soup is very tasty, very nourishing, the Chinese people may like to drink, but foreigners feel that their soup is very sticky is called soups, China this is called water! So, what you need is the best.

Analysis of DB2 partition database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More