Different temperature band data management solutions refer to storing frequently accessed data in the fast storage (hot data ), data with relatively low access frequency is stored in relatively slow storage (warm data ), infrequently accessed data is stored in the slowest storage (cold data) of the enterprise ). Developing this solution requires a set of key performance indicators
Different temperature band data management solutions refer to storing frequently accessed data in the fast storage (hot data ), data with relatively low access frequency is stored in relatively slow storage (warm data ), infrequently accessed data is stored in the slowest storage (cold data) of the enterprise ). Developing this solution requires a set of key performance indicators
Different temperature band data management solutions refer to storing frequently accessed data in the fast storage (hot data ), data with relatively low access frequency is stored in relatively slow storage (warm data ), infrequently accessed data is stored in the slowest storage (cold data) of the enterprise ). Developing this solution requires a set of key performance indicators (KPIs) to measure the "temperature" of the data and assist in the development of operational and business decisions containing the data. To demonstrate this solution, assume that your data is managed in segments in the following categories:
✓ Data within 31 days
✓ Data from 31 to 90 days
Bytes data from 91 to 180 days
✓ Data from 181 days to 365 days
✓ Data of more than 365 days
Assuming that the access frequency of data in the two-year period is lower than the access frequency of data in 90 days, although this may be reasonable, it is necessary to learn more about the frequency of access and change, other business decisions may be made. For example, there have been no changes in the past six months, and 100 rows of data in a month have suddenly completed the extraction, conversion, and loading cycle (ETL) changes, you may not want to take any action. Alternatively, if 10,000 rows of data have undergone ETL loop changes in that month, you may need to consider taking one or more of the following measures:
Executes some form of sampling operations to determine whether further analysis is necessary.
Re-execute the relevant report
Investigates the ETL process to find out why such a major change has occurred.
Explain retains the affected summary table and materialized query table (MQT)
Use IBM DB2 High Performance Unload to process affected data or the entire data table
Backup data table space
Collate reorganizes data (or only sorts indexes)
Worker run runstats tool
Pipeline performs some form of storage management or archiving
Determine the data that has changed and determine the quantity and frequency of changes at the same time, which can bring valuable comments to operational and business decision-making. This article shares some available metrics to help you understand the frequency, quantity, percentage of changes, and actions that can change your data.
Develop key performance indicators
Figure 1 shows the bar chart representation of a table, including December, and, the rate of change is high, and the rate of occurrence is also higher.
Figure 1: access frequency and change frequency
Understanding access frequency, change rate, and other useful measurements
When activated, the "persistent connection" metric in DB2 provides a fast and simple measurement report, which can then be used to develop a business view of data access modes and changed data activities. These measurements are generated after they are activated and can be stored in user-defined tables for further analysis.
Table measurement
Figure 2 lists some key measurements. each table and the range partition of each table can be obtained through the MON_GET_TABLE Table function.
The number of times a two-dimensional table or range partition is accessed.
Rows read rows (table or range partition)
Number of rows inserted by partitions (table or range partition)
Update the number of rows (table or range partition)
Rows deleted by partitions (table or range partition)
Rows does not cause any change to the column value of any row to update the number of rows (table or range partition)
Tablespace in which the two-dimensional table or range partition is located
Figure 2: activity measurement for tables and range partitions
These measurements help you answer the following questions:
How many rows have changed in the primary node? What is the change rate in a given period (when the results of calling table functions are sent and stored?
How many "new" rows are processed in a week specified by the queue?
How many Update statements in explain do not cause actual updates after execution (6th in figure 2 )?
What is the total number of rows updated in a tablespace?
How many rows are deleted in a specific period?
Index measurement
When index measurements do not provide data temperature information, they can still use the table function MON_GET_INDEX to explain the index utilization and index performance, thus completing your data graph. Figure 3 lists a subset of those measurements:
Unique index scan times
The number of times the secondary node accesses the scanned index
Number of updates to the primary key column
The number of times the cursor contains column updates
The number of times the secondary index jumps to scan
The number of times the Shard page is split.
Figure 3: subset of index utilization measurement