Analysis of DB2 partition database

Source: Internet
Author: User
Tags db2

1. Concept Description

DB2 the database partition is provided by the DB2 Enterprise version of the DPF (Data partitioning Feature) option, which is used primarily for partitioning (logical or physical) to distribute large databases that provide the necessary scalability and utilizes a non-shared (shared-nothing) structure. Databases are decomposed into separate partitions in a non-shared environment, each with its own resources, such as memory, CPU and disk, and its own data, indexes, configuration files, and transaction logs. Database partitions are sometimes referred to as nodes or database nodes. With DPF "divide-and-conquer" processing, scalability can be enhanced in a single server (scale-up) or cross-server cluster (scale-out).

One of the most obvious reasons to use DPF is to improve the performance of query workloads and insert/update/delete operations. DPF can also overcome the architectural limitations of some DB2. For example, in DB2, for a page size of 4 KB, the maximum size of a table is 8 GB, and the maximum size of a table is one GB for the size of a page, and for a page size of KB, the maximum size of the table is one GB; for a page size of KB, the table's The maximum size is GB. In DB2, the size limit for table and table spaces is defined by each partition. Partitioning the database across multiple partitions will allow you to increase the maximum size of the table based on the number of partitions in your environment.

DB2 database Partition instance diagram:

2. The effect of DPF on database performance

the data is through a hash algorithm are allowed to be hashed into different partitions, each partition is only responsible for processing its own data. After the user issues a SQL operation, the connected partition is called coordinate Node, which handles the user's request and decomposes the user's request into multiple subtasks according to the Partition key (partition key) to be processed in parallel by different partitions. Finally, the execution results of different partitions are aggregated back to the user, and the partition is transparent to the application.

In DB2, a database partition can be deployed on a cluster or MPP (multiple single CPU machines, a DB2 instance with multiple partition, 1 partition on each machine), which means that the database partitions are distributed across different machines The database partition can also be deployed on the same SMP (a machine with multiple CPUs, a DB2 instance with multiple partitions, where the number of partitions does not exceed the number of existing CPUs) on the machine, partitions on the same machine are called logical partitions. At the same time, we can deploy multiple partitions in a clustered or MPP environment and deploy multiple logical partitions on each node of the cluster or MPP.

There are several benefits of using database partitioning, which are briefly described below:

Query extensibility

This is one of the most important reasons for using database partitioning. Splitting a large database into smaller databases can improve the performance of queries because each database partition has its own subset of data. Suppose now scan 100,000 records, for a single partition of the database, the scan operation requires the database Manager to scan 100,000 records independently, if the database system into 10 partitions, and the average allocation of these 100,000 records on the 10 partitions, Then the database manager for each database partition scans only 100,000 records.

Schema Limitations

The largest table for a non-partitioned database depends on the page size, and 4K pages support up to gb,32k pages with a maximum support of up to GB of data. Table and table space size limits are limits on each partition, so dividing the database into N partitions increases the maximum size of the table to n times the maximum size of a single partitioned table. Memory can also be a limitation, especially in 32-bit operating system environments, because each database partition manages and has its own resources, so this limitation can be overcome through database partitioning.

Database load Performance

Database partitioning can load data in parallel to all database partitions, greatly reducing the load time of single tables, which is especially important for systems like real-time business intelligence systems that require particularly high data loading times.

Database Maintenance Performance

Spreading the database across multiple database partition servers can speed up system maintenance because each operation runs on a subset of the data managed by the partition, which further reduces the time to create the index through the database partition and reduces the time to collect statistics because runstats only runs on one database partition , reducing the time of Table reorganization (reorg).

Backup/Restore Performance

Partitioning a database onto a different database server can significantly reduce the time it takes to make a database backup, which is important to decide whether to use the database partition. DB2 enables parallel processing of backup and restore operations by assigning separate processes or threads to each table space. In a backup of a partitioned database environment, backups for each partition are independent, and backing up the database partition in parallel can significantly reduce the time to back up the entire database.

Log

In highly active systems, the performance of the database log may limit the overall throughput of the system. In a partitioned database environment, each partition has its own set of logs. When a large number of INSERT, UPDATE, and delete operations, multiple database partitions can improve performance because the logs are written in parallel on each database partition, and fewer logs are required per partition.

DB2 with the increase in data volumes or processors and partitions, you can provide near-linear scalability, but whether the database partition provides the most benefit depends on the workload being processed, the size of the largest table, and other factors. At present, the data warehouse of our project also uses the database partition, because the data volume is large, and the business to the CPU demand is also relatively big, but the machine is older, the stand-alone cannot increase the CPU, and the one instance realizes several partitions, therefore uses the other kind of SMP Cluster (multiple machines with multiple CPUs, one set of DB2 Instance with multiple partition, with more than one partition on each machine)

3. Comparison of DB2 partitions with Oracle
Oracle Partitioning DB2 partition Oracle 10g syntax DB2 V9 syntax
interval partition (range partitioning)        Table Zoning (Table partitioning) PARTITION by RANGE PARTITION by RANGE
Hash partition (hash partitioning) database partition (db partitioning)        PARTITION by HASH Distribute by HASH
List partition (partitioning) With build list partition (Table partitioning with Generated Column) PARTITION by LIST PARTITION by RANGE
Not supported Multidimensional Clustering (Multidimensional clustering) No ORGANIZE by DIMENSION

In the case of database partitioning, the following is a comparison of the DB2 database partition with the Oracle Hash partitioning feature:

DB2 partition

TD style= "Border-color:windowtext windowtext windowtext CurrentColor; Border-style:solid solid Solid none; border-width:1px 1px 1px medium; Border-image:none 100%/1/0 stretch; -moz-border-top-colors:none; -moz-border-left-colors:none; -moz-border-bottom-colors:none; -moz-border-right-colors:none; padding:0px 7px; "width=" 234 "valign=" top ">

Oracle partition

Partition schema

share-nothing

Share-disk

Partitioning properties

Each CPU has a private memory area and private disk space, and two CPUs cannot access the same disk space, and the communication between the CPUs is connected over the network.

Each CPU uses its own private memory area and accesses all disk systems directly through the internal communication mechanism.

difference

Scalability--increasing the physical nature of the database increases the computational resource (that is, the database partition)

cannot be increased by adding physical partition to the database

Statement example

Partition_tablename Table select Partition_ ID field as partition key

Hash_tablename tables are hashed by Hash_part field, and each partition is placed in the tablespace Tbsp1 and tbsp2 in a circular manner.

create table Partition_tablename
  (partition_id ID Not null     partition_id varchar ( 20 not null )
       in Tbsp_parts
  distribute by   HASH (partition_id);

CREATE TABLE Hash_tablename
(Hash_part ID,
hash_id varchar2())
PARTITION by HASH (hash_id)
(Partition P1 tablespace tbsp1,
partition P2 tablespace tbsp2);

4. Summary

The purpose of this sharing is to briefly introduce the concept of DB2 partitioning and compare it with Oracle's partitioning. Partitioned databases facilitate the performance of query workloads and DML operations. If the amount of data is small, the performance improvement is not very obvious, so the partitioned database is generally used in a large amount of data, query the requirements of the database more frequently. In fact, the use of Oracle good or DB2 good and there is no absolute choice. For example, Oracle employs a fully open strategy that enables customers to choose the most appropriate solution for their developers, and DB2 is best suited for massive data, and is the most widely used enterprise-wide, scalable and parallel. Like Guangdong's old fire soup is very tasty, very nourishing, the Chinese people may like to drink, but foreigners feel that their soup is very sticky is called soups, China this is called water! So, what you need is the best.


Analysis of DB2 partition database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.