Understanding the frequency and changes of data access in DB2

Last Update:2017-05-12 Source: Internet

Author: User

Tags ibm db2

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Different temperature band data management solutions refer to storing frequently accessed data in the fast storage (hot data ), data with relatively low access frequency is stored in relatively slow storage (warm data ), infrequently accessed data is stored in the slowest storage (cold data) of the enterprise ). Developing this solution requires a set of key performance indicators (KPIs) to measure the "temperature" of the data and assist in the development of operational and business decisions containing the data. To demonstrate this solution, assume that your data is managed in segments in the following categories:

✓ Data within 31 days
✓ Data from 31 to 90 days
Bytes data from 91 to 180 days
✓ Data from 181 days to 365 days
✓ Data of more than 365 days

Assuming that the access frequency of data in the two-year period is lower than the access frequency of data in 90 days, although this may be reasonable, it is necessary to learn more about the frequency of access and change, other business decisions may be made. For example, there have been no changes in the past six months, and 100 rows of data in a month have suddenly completed the extraction, conversion, and loading cycle (ETL) changes, you may not want to take any action. Alternatively, if 10,000 rows of data have undergone ETL loop changes in that month, you may need to consider taking one or more of the following measures:

 Executes some form of sampling operations to determine whether further analysis is necessary.
Re-execute the relevant report
 Investigates the ETL process to find out why such a major change has occurred.
Explain retains the affected summary table and materialized query table (MQT)
 Use IBM DB2 High Performance Unload to process affected data or the entire data table
 Backup data table space
Collate reorganizes data (or only sorts indexes)
Worker run runstats tool
Pipeline performs some form of storage management or archiving

Determine the data that has changed and determine the quantity and frequency of changes at the same time, which can bring valuable comments to operational and business decision-making. This article shares some available metrics to help you understand the frequency, quantity, percentage of changes, and actions that can change your data.

Develop key performance indicators

Figure 1 shows the bar chart representation of a table, including December, and, the rate of change is high, and the rate of occurrence is also higher.

Figure 1: access frequency and change frequency

Understanding access frequency, change rate, and other useful measurements

When activated, the "persistent connection" metric in DB2 provides a fast and simple measurement report, which can then be used to develop a business view of data access modes and changed data activities. These measurements are generated after they are activated and can be stored in user-defined tables for further analysis.
Table measurement

Figure 2 lists some key measurements. each table and the range partition of each table can be obtained through the MON_GET_TABLE Table function.
The number of times a two-dimensional table or range partition is accessed.
Rows read rows (table or range partition)

Number of rows inserted by partitions (table or range partition)
 Update the number of rows (table or range partition)
Rows deleted by partitions (table or range partition)
Rows does not cause any change to the column value of any row to update the number of rows (table or range partition)
Tablespace in which the two-dimensional table or range partition is located
Figure 2: activity measurement for tables and range partitions

These measurements help you answer the following questions:
How many rows have changed in the primary node? What is the change rate in a given period (when the results of calling table functions are sent and stored?
How many "new" rows are processed in a week specified by the queue?
How many Update statements in explain do not cause actual updates after execution (6th in figure 2 )?
What is the total number of rows updated in a tablespace?
 How many rows are deleted in a specific period?

Index measurement

When index measurements do not provide data temperature information, they can still use the table function MON_GET_INDEX to explain the index utilization and index performance, thus completing your data graph. Figure 3 lists a subset of those measurements:
Unique index scan times
The number of times the secondary node accesses the scanned index
Number of updates to the primary key column
The number of times the cursor contains column updates
The number of times the secondary index jumps to scan
The number of times the Shard page is split.
Figure 3: subset of index utilization measurement

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More