Big Data Index Analysis and Data Index Analysis
2014-10-04 BaoXinjian
I. Summary
PLSQL _ performance optimization series 14_Oracle Index Anaylsis
1. Index Quality
The index quality has a direct impact on the overall performance of the database.
Good and high-quality indexes increase the database performance by an order of magnitude, while inefficient and redundant indexes make the database performance slow, even with high-end hardware configuration.
Therefore, the index must be tested and considered repeatedly at the beginning of the design.
For databases already in the production environment, we can also obtain the index quality by querying the relevant data dictionary. Through this analysis, we can guide how to improve the index performance.
2. basic guiding principles for index creation
The creation of indexes should follow the fine but few principles
Collect different combinations of all queries on the table to find out columns (or primary key columns) with the best discretization and create a single index.
Create a composite index for columns that frequently read but lack ideal discrete values
For composite indexes, the following factors should be taken into account to create a proper index column order. The following priority levels are from high to low as the leading column and the second column of the index.
- Column Usage Frequency
- Does this column frequently use "=" as a common query condition?
- Column discretization
- Order in which Composite Columns are frequently sorted
- Columns to be added as attachment Columns
Ii. Case study-Table index and Index Quality
1. query the index column information on a single table
SQL> @/home/oracle/sql/idx_info.sqlEnter value for owner: SHEnter value for table_name: SALESTable Index CL_NAM CL_POS STATUS IDX_TYP DSCD------------------------- ------------------------- -------------------- ------ -------- --------------- ----SALES SALES_CHANNEL_BIX CHANNEL_ID 1 N/A BITMAP ASC SALES_CUST_BIX CUST_ID 1 N/A BITMAP ASC SALES_PROD_BIX PROD_ID 1 N/A BITMAP ASC SALES_PROMO_BIX PROMO_ID 1 N/A BITMAP ASC SALES_TIME_BIX TIME_ID 1 N/A BITMAP ASC5 rows selected.
(1). As shown in the preceding query results, the current table TRADE_CLIENT_TBL contains four indexes. Therefore, the indexes in this table are redundant.
(2). In most cases, 6-7 indexes on a single table are ideal. Excessive indexing results in excessive resource overhead and reduces DML performance.
2. Obtain the Index Quality Information Report of the specified schema or table
SQL> @/home/oracle/sql/idx_quality.sqlEnter value for input_owner: SHEnter value for input_tbname: SALES Table Table Index Data Blks Leaf Blks Clust IndexTable Rows Blocks Index Size MB per Key per Key Factor Quality------------------------- ------------ ---------- ------------------------- ------- --------- --------- ------------ -------------SALES 918,843 1769 SALES_PROD_BIX 0 14 1 1,074 5-Excellent SALES_CUST_BIX 0 5 1 35,808 5-Excellent SALES_TIME_BIX 0 1 1 1,460 5-Excellent SALES_CHANNEL_BIX 0 23 11 92 5-Excellent SALES_PROMO_BIX 0 13 7 54 5-Excellent 5 rows selected.
(1 ). according to the index quality output from a single table, four indexes at the Poor level appear. That is to say, these indexes have a large clustering factor, which is almost close to the rows on the table.
(2). The quality of these indexes should be determined based on the index usage frequency.
(3) Clustering factors can only be improved by reorganizing table data and adjusting the order of the corresponding index columns.
Iii. Case study-index usage frequency report
Oracle provides the index monitoring feature to determine whether an index is used. In Oracle 10 Gb, the index is monitored when statistical information is collected. In Oracle 11 GB, this phenomenon no longer exists.
However, this method only provides whether the index is used. The frequency of indexing usage cannot be reflected.
The following script will get the index usage, which can well measure the index usage and determine whether these indexes can be removed or improved based on the value. \
For more information, see
1. Determine whether the index is used
SQL> @/home/oracle/sql/idx_usage_detail.sql SH 1 IndexTable name Index name Index type Size MB Index operation Executions------------------------------ ------------------------------ --------------- ----------- --------------------- ----------COSTS COSTS_PROD_BIX BITMAP 1.75 - 0 COSTS_TIME_BIX BITMAP 1.75 - 0****************************** ****************************** *************** ----------- ----------sum 3.50 0SALES SALES_CHANNEL_BIX BITMAP 1.75 - 0 SALES_CUST_BIX BITMAP 5.69 SINGLE VALUE 2 FAST FULL SCAN 1 SALES_PROD_BIX BITMAP 1.75 SINGLE VALUE 3 FAST FULL SCAN 1 SALES_PROMO_BIX BITMAP 1.75 FULL SCAN 1 SALES_TIME_BIX BITMAP 1.94 - 0****************************** ****************************** *************** ----------- ----------sum 20.31 89 rows selected.
(1). The above results list the usage frequency of indexes with the schema of SH in the current database and the index size greater than 1 MB.
(2) because the current database is a standard version and does not have the partition table function, you can see many tables ending with the arc, and the index is large, such as the index on ACC_POS_STOCK_TBL_ARC reaches 19 GB.
(3) The range scan on the primary key SALES_PROD_BIX of table SALES is the most, and the total number of use is three.
(4). For the indexes listed above that are used for a period of 0, check whether the index settings are reasonable.
(5) If the index is too large, you should consider whether to use index compression.
(6) The last list is the schema name of the report, the filter conditions of the index size, and the date on which the index is collected. Note: The sum of the size of the index column is inaccurate.
2. Summary
Two substitution variables are used, schema and index.
By default, DBA_HIST_ SQL _PLAN is not collected for the execution plans of small indexes and SQL statements that only run once or twice.
Therefore, the recommended index size input value for script execution is 100.
If you need to collect all the historical SQL Execution plans to determine whether the index is used, you need to modify statistics_level to all or the snapshot collection policy.
Collection policies have a certain impact on the system and consume a large amount of disk space. Therefore, the Prod environment should be used with caution (UAT and DEV should be fine ).
Download the script (which was compiled by Amy and borrowed)
1. idx_info. SQL http://files.cnblogs.com/eastsea/idx_info.zip
2. idx_quality. SQL http://files.cnblogs.com/eastsea/idx_quality.zip
3. idx_usage_detail. SQL http://files.cnblogs.com/eastsea/idx_usage_detail.zip
Reference: Sha mi great god Co., http://blog.csdn.net/leshami/article/details/23687137.
SQL 2000 500 solutions for querying large data volumes,
Solve specific problems
First, analyze the data query logic from the business point of view, and often find a lot of optimization space from the business.
In terms of query performance, indexes can be used technically. indexes are not necessarily related to primary keys. indexes can still be created without primary keys. Indexes are used to optimize query performance, rather than constraints such as business integrity.
If the data is incorrect due to the primary key issue, you need to find the cause from the business source.
Database index problems
If you create a BTree for a common index, you can directly locate the result specified by your search condition.
For a single query, isn't there much optimization?
However, if subqueries are involved, they should be optimized to increase,
My consideration is that we should try to narrow down the retrieved subset When retrieving indexed tables (or place the largest data query in an indexed place, extract the subset, and perform further processing ), this greatly improves the efficiency, especially when the data volume is large.
Analyze specific issues.