ORA-01555 Problem Analysis and Solution
Today, my colleague sent me a problem, throwing a ORA-01555 error when running a Job, hope we can see what can be found from the database layer.
The error log is as follows:
Function: EntitySQLCursor: query
Line number: 113
Time: Thu Jul 2 22:52:46 2015
Message text: (PE1-000143) Internal IO Framework Database Error, message ORA-01555: snapshot too old: rollback segment number 22 with name "_ SYSSMU22_234950861 $" too small, code 1555.
It seems that the rollback segment allocated by Oracle is too small. For this problem, because it has been a while, one way to analyze it properly is to use ash.
Based on the timestamp in the error message, it is found ~ Captured an ash report within one minute.
Because the information is more targeted, we can clearly see that some queries and dml statements are running at the database level within that minute, some are scanned by the full table, and some are scanned by the index.
Top SQL with Top Events
SQL ID |
Planhash |
Sampled # of Executions |
% Activity |
Event |
% Event |
Top Row Source |
% RwSrc |
SQL Text |
Fzn01wc5pg2dg |
1199754052 |
2 |
15.67 |
CPU + Wait for CPU |
11.75 |
Table access-FULL |
11.75 |
SELECT/* + ALL_ROWS USE_NL ("... |
|
|
|
|
Db file sequential read |
2.61 |
Table access-FULL |
2.61 |
|
|
|
|
|
Direct path read |
1.31 |
Table access-FULL |
1.31 |
|
5q2mguqdcrq4a |
421773076 |
1 |
12.01 |
Db file sequential read |
12.01 |
INDEX-RANGE SCAN |
7.05 |
Select re. L3_NET_START_TIME, R... |
A793wrq0q27c5 |
201265388 |
1 |
10.70 |
Db file sequential read |
8.09 |
DELETE |
8.09 |
Delete from RATED_EVENT WHERE... |
|
|
|
|
CPU + Wait for CPU |
1.57 |
DELETE |
1.57 |
|
|
|
|
|
Direct path read temp |
1.04 |
DELETE |
1.04 |
|
496x3fkydc1xj |
84305990 |
1 |
9.92 |
Db file sequential read |
8.62 |
INDEX-RANGE SCAN |
8.62 |
** SQL Text Not Available ** |
|
|
|
|
CPU + Wait for CPU |
1.31 |
INDEX-RANGE SCAN |
1.31 |
|
Dm1d93bw2jdzc |
2843169790 |
27 |
8.09 |
Db file sequential read |
4.70 |
INDEX-RANGE SCAN |
2.09 |
Select sk. rowid, sk. subscribe... |
|
|
|
|
CPU + Wait for CPU |
3.39 |
SELECT STATEMENT |
2.35 |
The statements for full table scan and DML statements are important.
Let's take a look at the statements for full table scan.
SELECT/* + ALL_ROWS USE_NL ("acw.control_hist") FULL ("acw.control_hist ")*/.... from "acw.control_hist" WHERE "CUR_PGM_NAME" = 'rgd 'AND "IDENTIFIER" =: 1
The statement outputs many fields, but there is only one related table. This table can be seen from the table name as a historical table, and the data volume is relatively large. view the statistics, the data volume is hundreds of millions of data records.
For such a large table, hint is used to specify full table scan, which is required in some places. You can check the index information with questions, and the primary key index starts with the IDENTIFIER Field.
From this point of view, this problem is a very obvious problem, because the use of Hint is improper, the query results of the index scan should have gone through the full table scan that consumes a lot of resources.
Of course, there is a saying in philosophy that it is reasonable to exist. In the early days, the data volume was not large, and it was necessary to scan the entire table, or this part of the logic is directly referenced from somewhere, and the hint has forgotten to change, resulting in such a problem.
When a problem occurs, there are also a variety of reasons to find the problem. Of course, this problem still happened in the end. It is more important to detect and fix it in time.
The analysis of this problem has come to an end, but the impact of dml on undo cannot be underestimated. for reference, the delete statement in the preceding table is used.
For this statement, delete involves a large partition table with more than million data records. On the premise of index-based scanning, data is cleaned up based on the timestamp. For this operation, we can consider that the current delete logic is correct. The delete operation will consume a large amount of undo resources without the impact of acw.control_hist full table scan. At this time, you also need to consider whether the current undo size fully meets the requirements of the system. Currently, the size of the undo in the database is about 17 GB, and several large partition tables are all over GB. If the time stamp specified for deletion is larger, the consumption of undo will be larger, therefore, you also need to consider the size of the undo. Based on the current situation, you can consider increasing the space of the undo.
Therefore, the analysis result of this problem is two suggestions. The first one is to improve the full table scan for the statements that should have been indexed, and standardize the use of hint. On the other hand, we recommend that you increase the size of the undo to meet the requirements of the system and increase the load of the system.