For everyone to share a case of SQL optimization we encountered, the case is not my own responsible database, I am just a porter.
This case occurred last year, discovered because the Nets host CPU rose, the development operation found the DA, then the DA to analyze the situation, and finally locate an inefficient SQL deployment analysis. (About DA, a subdivision of DBA in the database technology department of Ping An Group)
Here is the SQL you are targeting:
SELECT /*+ index(c IDX_CALLINFO_UPDATED_DATE) */COUNT(1) FROM T_SOFTPHONE_CALLINFO CWHERE C.updated_date >= sysdate - 1 / 48 and (C.ANI like ‘%‘ || :1 or C.DANI like ‘%‘ || :2) and C.CREATED_BY = :3;
First, the following information is learned based on this SQL related table (t_softphone_callinfo).
1. This is a call-in information table (where the scene does count statistics)
2.ANI and Dani incoming variables are phone numbers, one is a landline number, one is a mobile phone number
3. The hint index used is the Time field (Updated_date) General B-tree Index, and the execution plan is also walking this index range mode, which is fine.
After a certain understanding of the appeal of SQL, Da first collected some of the following historical executions, which were concluded as follows:
By comparing the growth in recent months, it has been found that while the execution plan has not changed, the frequency of execution has increased from 1000 times per 15 minutes to about 60,000 times, and the consumption of a single logical read has increased several times, and as the volume of business and data has climbed, the indexing of such time fields has become increasingly inefficient. Become a hidden trouble in SQL, at some point in time the problem is highlighted, need to optimize the improvement.
PS: At this time the volume of nets is close to 30TB
In this respect, Da put forward some comments and questions:
1, the first, the second parameter are mobile phone number, and from the history is the input of the complete mobile phone number, why to use like, can directly change to equal sign?
2, two telephone fields have a separate index, if not like, you can optimize the above SQL statement, go to the corresponding phone number index, rewrite form similar to the following way:
SELECT (SELECT /*+ index(c IDX_SOFTPHONE_CALLINFO_ANI) */ COUNT(1) FROM T_SOFTPHONE_CALLINFO C WHERE C.updated_date >= sysdate - 1 / 48 and (C.ANI = ‘159******22‘) and C.CREATED_BY = ‘*******880‘) + (SELECT /*+ index(c IDX_SOFTPHONE_CALLINFO_DANI) */ COUNT(1) FROM T_SOFTPHONE_CALLINFO C WHERE C.updated_date >= sysdate - 1 / 48 and (C.DANI = ‘159******22‘) and C.CREATED_BY = ‘******880‘) FROM DUAL;
In this way, the original average logical reading from 30,000 can be reduced to about 200, which is a great efficiency improvement.
But then, development and operations gave new business feedback and found that things were not so simple.
Development operations: The phone number in the t_softphone_callinfo is obtained from random data, may contain 0 prefix, if you want to count all the information, can not directly use the equal sign, plus 0 and no 0, with the telephone inbound location, add 0, local no 0, You're calling 95511 on a, maybe not adding 0, and you're using the same phone number in B to call 95511, and you might add 0. After DA's troubleshooting is true, the mobile phone number even there are special substitution symbols exist.
Thinking:
In this case, it seems that the way the like can not be changed, the use of% in the field will suppress the use of the index, so that the corresponding index can not be used, how to circumvent the problem and use efficient index it? Through my own thinking and colleagues ' suggestions, combined with the current business scenario, a reliable solution is to create a function index, reverse order function Index!
The test and analysis phase immediately after the thought of the method.
1. First a function index of two corresponding fields is created
Create index NETS2DATA.IDX_SOFTPHONE_CI_ANI_REV on NETS2DATA.T_SOFTPHONE_CALLINFO(reverse(ani)) tablespace NETS2DATA parallel 8 ;Createindex NETS2DATA.IDX_SOFTPHONE_CI_DANI_REV on NETS2DATA.T_SOFTPHONE_CALLINFO(reverse(dani)) tablespace NETS2DATA parallel 8;
2. Rewrite the SQL
SELECT COUNT(1)FROM (SELECT /*+index(c IDX_SOFTPHONE_CI_ANI_REV) */ C.CALLINFO_ID FROM T_SOFTPHONE_CALLINFO C WHERE C.updated_date >= sysdate - 1 / 48 and (reverse(C.ANI) like :1 || ‘%‘) and C.CREATED_BY = :2 UNION ALL SELECT /*+index(c IDX_SOFTPHONE_CI_DANI_REV) */ C.CALLINFO_ID FROM T_SOFTPHONE_CALLINFO C WHERE C.updated_date >= sysdate - 1 / 48 and (reverse(C.DANI) like :3 || ‘%‘) and C.CREATED_BY = :4);
Performance testing down, in most scenarios, the efficiency of the improvement is very obvious, the original average hundreds of thousands of of the consumption of the basic interval to hundreds of, the original walk is the Time field index, now go to the two phone number field of the reverse order function index, and then immediately developed to arrange the first round of rectification, look forward to have a good effect.
The accompanying implementation plan is similar to the following:
Execution Plan----------------------------------------------------------plan hash value:1437385812--------------- -------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 17 | 6 (0) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 17 | | || * 2 | TABLE ACCESS by INDEX rowid| T_softphone_callinfo | 1 | 17 | 6 (0) | 00:00:01 | | * 3 | INDEX RANGE SCAN | Idx_callinfo_updated_date | 2 | | 4 (0) | 00:00:01 |------------------------------------------------------------------------------------------------------ ----predicate information (identified byoperation ID):---------------------------------------------------2-filter ("ani" Like '%152******96 ' and "ANI" are not NULL) 3-access ("C". " Updated_date ">[email protected]!-1) Statistics----------------------------------------------------------8 Recursive calls 0 db block gets 291086 consistent gets 0 physical reads 0 Redo size 515 Bytes sent via sql*net to client 4 Bytes received via sql*net from client 2 sql*net roundtrips to/from client 0 sorts (memory) 0 Sorts (disk) Rows processedexecution plan----------------------------------------------------------plan hash Value: 3534627589----------------------------------------------------------------------------------------------------- -| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 17 | 831K (1) | 02:46:18 | | 1 | SORT AGGREGATE | | 1 | 17 | | || * 2 | TABLE ACCESS by INDEX rowid| T_softphone_callinfo | 1 | 17 | 831K (1) | 02:46:18 | | * 3 | INDEX RANGE SCAN | Idx_softphone_ani_ant | 4989k| | 14254 (1) | 00:02:52 |------------------------------------------------------------------------------------------------------ predicate information (identified byoperation ID):---------------------------------------------------2-filter ("C". " Updated_date ">[email protected]!-1" 3-access (REVERSE ("ANI") Like ' 69******251% ') filter (REVERSE ("ANI") LIK E ' 69******251% ') Statistics----------------------------------------------------------8 Recursive calls 0 db block gets 137 consistent gets 0 physical reads 0 Redo size 515 Bytes sent via sql*n Et to client 492 bytes received via sql*net from client 2 sql*net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) rows processed
However, after a few days, there have been some episodes in the middle, development feedback Although most of the scenes are extremely efficient, there are still a small number of scenes that are inefficient and consume higher values, although those values may not have many scenes, they occasionally occur. The DA analysis immediately thought of whether the data was skewed, which would result in a small number of poor efficiency.
After guessing the situation, immediately log on to the system to see the data tilt, sure enough, some values tilt very strong, there is a 8 million, there are many 100 to 2 million field values, when taken to these extrema, the light by a single value index, the efficiency must be very poor, under:
So into a new round of thinking analysis, how to rectify can meet all the scenarios, can directly create more efficient index? In an instant found this SQL using three conditions (Updated_date,ani (Dani), created_by), development also provides the idea that in three conditions filtered out of the data is not much, this time there is a new idea, can create a composite index, by selective arrangement, Does it have an amazing effect?
Not much to say, immediately opened a new round of performance test analysis, through several combinations of composite index and single-valued index test, the specific steps do not need to say, please directly see the following test data:
Presumably after repeated performance analysis tests and experiments, combined with the above test data, you already know which way is best. Finally, we also adopted the most suitable for the scene of the transformation plan, and a round of rectification, monitoring the operation of the following days, the effect is excellent, and finally completely solve all the problems, happy, feel is a good case, to share a bit!
In the end, there may be a reunion. The reverse order index and the inverse function index are different concepts:
A reverse index is also a B-tree index, but it is physically reversed for each index key value that is saved in column order. For example, the index key is 20, with 16 binary storage This standard B-Tree index key Two bytes is c1,15, then the reverse index stored bytes is 15,C1, the purpose is to reduce the contention of the scattered index leaf block, for large concurrent insert scene is more practical, but the drawbacks are more obvious, when using the scope of the query, As the data is dispersed within different blocks, performance is reduced.
Function index is a function-based index, for some fields using special functions, if you need to use an index to establish the relevant index, in this case scenario, I need to do is to completely reverse the number (not the byte upside down), the concept is not used, more is standing in the query efficiency and scene use angle, Therefore, the comprehensive consideration is more suitable for the use of the inverse order function, and the index of the related inverse order function is established.
Experience:
This case involves a simple SQL, but requires the DA to have a solid basic skills and good business sense of smell. Today, as the database becomes smarter, and everyday operations are simpler, DBAs need to be deeply integrated with the business to perform SQL optimization and architecture design based on business features.
Use of the inverse function index for Oracle operations case