Big Data Index Analysis and Data Index Analysis

Source: Internet
Author: User

Big Data Index Analysis and Data Index Analysis

2014-10-04 BaoXinjian

I. Summary

PLSQL _ performance optimization series 14_Oracle Index Anaylsis

1. Index Quality

The index quality has a direct impact on the overall performance of the database.

Good and high-quality indexes increase the database performance by an order of magnitude, while inefficient and redundant indexes make the database performance slow, even with high-end hardware configuration.

Therefore, the index must be tested and considered repeatedly at the beginning of the design.

For databases already in the production environment, we can also obtain the index quality by querying the relevant data dictionary. Through this analysis, we can guide how to improve the index performance.

 

2. basic guiding principles for index creation

The creation of indexes should follow the fine but few principles

Collect different combinations of all queries on the table to find out columns (or primary key columns) with the best discretization and create a single index.

Create a composite index for columns that frequently read but lack ideal discrete values

For composite indexes, the following factors should be taken into account to create a proper index column order. The following priority levels are from high to low as the leading column and the second column of the index.

  • Column Usage Frequency
  • Does this column frequently use "=" as a common query condition?
  • Column discretization
  • Order in which Composite Columns are frequently sorted
  • Columns to be added as attachment Columns

 

Ii. Case study-Table index and Index Quality

1. query the index column information on a single table

SQL> @/home/oracle/sql/idx_info.sqlEnter value for owner: SHEnter value for table_name: SALESTable                     Index                     CL_NAM               CL_POS STATUS   IDX_TYP         DSCD------------------------- ------------------------- -------------------- ------ -------- --------------- ----SALES                     SALES_CHANNEL_BIX         CHANNEL_ID                1 N/A      BITMAP          ASC                          SALES_CUST_BIX            CUST_ID                   1 N/A      BITMAP          ASC                          SALES_PROD_BIX            PROD_ID                   1 N/A      BITMAP          ASC                          SALES_PROMO_BIX           PROMO_ID                  1 N/A      BITMAP          ASC                          SALES_TIME_BIX            TIME_ID                   1 N/A      BITMAP          ASC5 rows selected.

(1). As shown in the preceding query results, the current table TRADE_CLIENT_TBL contains four indexes. Therefore, the indexes in this table are redundant.
(2). In most cases, 6-7 indexes on a single table are ideal. Excessive indexing results in excessive resource overhead and reduces DML performance.

2. Obtain the Index Quality Information Report of the specified schema or table

SQL> @/home/oracle/sql/idx_quality.sqlEnter value for input_owner: SHEnter value for input_tbname: SALES                                 Table      Table                             Index Data Blks Leaf Blks        Clust IndexTable                             Rows     Blocks Index                     Size MB   per Key   per Key       Factor Quality------------------------- ------------ ---------- ------------------------- ------- --------- --------- ------------ -------------SALES                          918,843       1769 SALES_PROD_BIX                  0        14         1        1,074 5-Excellent                                                  SALES_CUST_BIX                  0         5         1       35,808 5-Excellent                                                  SALES_TIME_BIX                  0         1         1        1,460 5-Excellent                                                  SALES_CHANNEL_BIX               0        23        11           92 5-Excellent                                                  SALES_PROMO_BIX                 0        13         7           54 5-Excellent     5 rows selected.

(1 ). according to the index quality output from a single table, four indexes at the Poor level appear. That is to say, these indexes have a large clustering factor, which is almost close to the rows on the table.
(2). The quality of these indexes should be determined based on the index usage frequency.
(3) Clustering factors can only be improved by reorganizing table data and adjusting the order of the corresponding index columns.

Iii. Case study-index usage frequency report

Oracle provides the index monitoring feature to determine whether an index is used. In Oracle 10 Gb, the index is monitored when statistical information is collected. In Oracle 11 GB, this phenomenon no longer exists.

However, this method only provides whether the index is used. The frequency of indexing usage cannot be reflected.

The following script will get the index usage, which can well measure the index usage and determine whether these indexes can be removed or improved based on the value. \

For more information, see

 

1. Determine whether the index is used

SQL> @/home/oracle/sql/idx_usage_detail.sql SH 1                                                                                    IndexTable name                     Index name                     Index type          Size MB Index operation       Executions------------------------------ ------------------------------ --------------- ----------- --------------------- ----------COSTS                          COSTS_PROD_BIX                 BITMAP                 1.75        -                       0                               COSTS_TIME_BIX                 BITMAP                 1.75        -                       0****************************** ****************************** *************** -----------                       ----------sum                                                                                  3.50                                0SALES                          SALES_CHANNEL_BIX              BITMAP                 1.75        -                       0                               SALES_CUST_BIX                 BITMAP                 5.69 SINGLE VALUE                   2                                                                                          FAST FULL SCAN                 1                               SALES_PROD_BIX                 BITMAP                 1.75 SINGLE VALUE                   3                                                                                          FAST FULL SCAN                 1                               SALES_PROMO_BIX                BITMAP                 1.75 FULL SCAN                      1                               SALES_TIME_BIX                 BITMAP                 1.94        -                       0****************************** ****************************** *************** -----------                       ----------sum                                                                                 20.31                                89 rows selected. 

(1). The above results list the usage frequency of indexes with the schema of SH in the current database and the index size greater than 1 MB.

(2) because the current database is a standard version and does not have the partition table function, you can see many tables ending with the arc, and the index is large, such as the index on ACC_POS_STOCK_TBL_ARC reaches 19 GB.

(3) The range scan on the primary key SALES_PROD_BIX of table SALES is the most, and the total number of use is three.

(4). For the indexes listed above that are used for a period of 0, check whether the index settings are reasonable.

(5) If the index is too large, you should consider whether to use index compression.

(6) The last list is the schema name of the report, the filter conditions of the index size, and the date on which the index is collected. Note: The sum of the size of the index column is inaccurate.

2. Summary

Two substitution variables are used, schema and index.

By default, DBA_HIST_ SQL _PLAN is not collected for the execution plans of small indexes and SQL statements that only run once or twice.

Therefore, the recommended index size input value for script execution is 100.

If you need to collect all the historical SQL Execution plans to determine whether the index is used, you need to modify statistics_level to all or the snapshot collection policy.

Collection policies have a certain impact on the system and consume a large amount of disk space. Therefore, the Prod environment should be used with caution (UAT and DEV should be fine ).

 

Download the script (which was compiled by Amy and borrowed)

1. idx_info. SQL http://files.cnblogs.com/eastsea/idx_info.zip

2. idx_quality. SQL http://files.cnblogs.com/eastsea/idx_quality.zip

3. idx_usage_detail. SQL http://files.cnblogs.com/eastsea/idx_usage_detail.zip

 

 

Reference: Sha mi great god Co., http://blog.csdn.net/leshami/article/details/23687137.


SQL 2000 500 solutions for querying large data volumes,

Solve specific problems
First, analyze the data query logic from the business point of view, and often find a lot of optimization space from the business.

In terms of query performance, indexes can be used technically. indexes are not necessarily related to primary keys. indexes can still be created without primary keys. Indexes are used to optimize query performance, rather than constraints such as business integrity.

If the data is incorrect due to the primary key issue, you need to find the cause from the business source.

Database index problems

If you create a BTree for a common index, you can directly locate the result specified by your search condition.

For a single query, isn't there much optimization?
However, if subqueries are involved, they should be optimized to increase,
My consideration is that we should try to narrow down the retrieved subset When retrieving indexed tables (or place the largest data query in an indexed place, extract the subset, and perform further processing ), this greatly improves the efficiency, especially when the data volume is large.
Analyze specific issues.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.