A customer said that an error was reported during data export after a database was shut down and restarted due to abnormal shutdown. The expdp could not work normally and then exited directly after an error was reported:
Processing object type SCHEMA_EXPORT/JOB
.. Exported "STATS". "T_REPORT_MONTH_TEMPS" 988.2 MB 1292221 rows
ORA-39014: one or more worker processes have exited prematurely.
ORA-39029: worker Process 1 (process name "DW01") prematurely terminated
ORA-31672: the Worker process DW01 stops unexpectedly.
Job "SYS". "SYS_EXPORT_SCHEMA_04" stopped at 23:58:10 due to a fatal error
Check the alert log at this time to find the following similar errors:
Errors in file/u01/app/oracle/admin/orcl/bdump/orcl_dw01_28608.trc:
ORA-07445: exception error: core dump [klufprd () + 321] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []
From the above information, we can draw the following conclusions:
1. The write process of expdp reports an error because the log is generated as a dw process.
2. dw process error is caused by ora-07445 [klufprd () + 321] error.
3. This function is rare for [klufprd () + 321. But we can see from the first two that this must be related to the buffer cache.
So it is easy to solve this problem temporarily. Use alter system flush buffer_cache to refresh the cache and then execute the expdp operation again.
After subsequent attempts, the customer finds that, although the expdp operation still reports an error, expdp will not terminate abnormally and the subsequent export of other objects will continue.
Further analysis of the error message shows the following prompt:
* ** Session id: (2760.1968) 00:14:14. 347
Row 01808438.0 continuation
File #6 block #33784 slot 14 not found
**************************************** **********
KDSTABN_GET: 0... ntab: 1
CurSlot: 14... nrows: 14
**************************************** **********
* ** 00:14:14. 348
Ksedmp: internal or fatal error
ORA-00600: & Auml; & Uacute; ² & iquest; & acute; & iacute; & Icirc; & oacute; & acute; & uacute; & Acirc; & euml ;, 2 & Icirc; & Ecirc; & yacute;: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT/* + NESTED_TABLE_GET_REFS + */"STATS". "T_REPORT_MONTH". * FROM "STATS". "T_REPORT_MONTH"
----- Call Stack Trace -----
Obviously, the table mentioned here is exactly the table encountered by the expdp error. However, after we refresh the buffer cache, expdp can skip this table to continue exporting other objects.
According to the above information, there is an error. The customer also realized that the data file was inspected through dbv, but the file was not damaged.
Here, we should note that dbv checks physical bad blocks at the same time and is usually powerless for logical bad blocks. Of course, logical errors in blocks can be checked out for this type of block dbv.
However, from the information here, Oracle found that the required row 01808438.0 record should be found in file 6 block 33784, but not found.
Note that file 6 block 33784 is in good condition.
So what does row 01808438.0 mean?
In fact, this is the nrid, which can be understood as a persistent pointer. The first part is the rdba address, followed by the row number.
What should we do if we want to further analyze why this error occurs? It is easy to dump block 33784 and rdba 01808438 (hexadecimal) respectively. The conversion script is as follows:
SQL> SELECT dbms_utility.data_block_address_block (25199672) "BLOCK ",
2 dbms_utility.data_block_address_file (25199672) "FILE"
3 FROM dual;
BLOCK FILE
--------------------
33848 6
Row 01808438.0 is mentioned in the log error, so we will first analyze the dump of file 6 block 33848:
Block header dump: 0x01808438
Object id on Block? Y
Seg/obj: 0xc03d01 csc: 0xb37. 78b5ae28 itc: 3 flg: E typ: 1-DATA
Brn: 0 bdba: 0x1807d8a ver: 0x01 opc: 0
Inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a. 02d. 000cdc5c 0x00809c91. 6507.21 -- U-2 fsc 0x0001. 78b6a4b1
0x02 0x000a. 014.000cdd00 0x00806957. 0000d. 15 -- U-2 fsc 0x0000. 78b6ec5d
0x03 0x000a. 025.000cdd5d 0x00801e50. 0000f. 0a -- U-2 fsc 0x0000. 78b71584
Data_block_dump, data header at 0x1fb2f87c
====================
Tsiz: 0x1f80
Hsiz: 0x34
Pbl: 0x1fb2f87c
Bdba: 0x01808438
76543210
Flag = --------
Ntab = 1
Nrow = 17
Frre =-1
Fsbo = 0x34
Fseo = 0xd2
Avsp = 0x33b
Tosp = 0x33c
0xe: pti [0] nrow = 17 offs = 0
0x12: pri [0] offs = 0x1e34
......
0x30: pri [15] offs = 0x6e2
0x32: pri [16] offs = 0x583
Block_row_dump:
Tab 0, row 0, @ 0x1e34
Tl: 332 fb: -- H-F --- lb: 0x0 cc: 79
Nrid: 0x018083f8. e
Col 0: [5] c4 04 5a 27 1b
Col 1: [7] 47 59 30 32 30 30 31
Col 2: [4] c3 15 11 04
Col 3: [12] 31 38 37 33 34 32 30 30 30 36
Col 4: [12] 31 34 30 34 34 32 30 30 30 31
Col 5: [30]
......
The above is similar to the 01808438 rows of rdba address 0th, which is the first line we all understand. We can find that there is a nrid address in the row header in this line of records.
When it comes to the nrid address, this is usually a situation for row link migration. Why does it appear here?
There are several types of row migration. The most common one is actually within the block. The maximum number of columns for a single record in a block is 255. When the number of columns in a row exceeds 255, other column databases are divided into another row piece by oracle and exist in the same block (of course, it may also be stored in other blocks ).
That is to say, the row data of more than 255 columns will be divided into multiple row piece. But when we read the row data, how do we know it is a complete whole?
The answer is nrid. oracle uses nrid to concatenate multiple row piece to form a complete row data.
Let's look back at the previous error. Row 01808438.0 indicates the row 0th of the block, and the nrid address of row 0th of the block is 0x018083f8. e.
Then we went to block 0x018083f8 to find the row e record and found that the result was as follows:
Object id on Block? Y
Seg/obj: 0xc03d01 csc: 0xb37. 78bb5e9f itc: 3 flg: E typ: 1-DATA
Brn: 0 bdba: 0x1807d8a ver: 0x01 opc: 0
Inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a. 013.000csc01 0x01c02834. 6573.33 -- U-2 fsc 0x0000. 78cbf31d
0x02 0x000a. 001.000cda7a 0x0080150a. 64d3. 21 C --- 0 scn 0x0b37. 78b584df
0x03 0x000a. 01e. 000 cdade 0x00801510. 64d3. 13 C-U-0 scn 0x0b37. 78b99f21
Data_block_dump, data header at 0x2b4fc709007c
====================
Tsiz: 0x1f80
Hsiz: 0x2e
Pbl: 0x2b4fc709007c
Bdba: 0x018083f80x018083f8
76543210
Flag = --------
Ntab = 1
Nrow = 14
Frre =-1
Fsbo = 0x2e
Fsearch = 0x568
Avsp = 0x53a
Tosp = 0x53a
0xe: pti [0] nrow = 14 offs = 0
0x12: pri [0] offs = 0x1d78
0x14: pri [1] offs = 0x1c37
......
0x2a: pri [12] offs = 0x6c2
0x2c: pri [13] offs = 0x568
Block_row_dump:
Tab 0, row 0, @ 0x1d78
Tl: 520 fb: ---- L -- lb: 0x0 cc: 255
......
Tab 0, row 13, @ 0x568
Tl: 346 fb: -- H-F --- lb: 0x1 cc: 79
Nrid: 0x018083f8. c
Col 0: [5] c4 04 5a 3a 0a
Col 1: [7] 47 59 30 32 30 30 31
Col 2: [4] c3 15 11 04
......
Col 76: [1] 80
Col 77: [1] 80
Col 78: [1] 80
End_of_block_dump
End dump data blocks tsn: 6 file #: 6 minblk 33784 maxblk 33784
We can see that there is no corresponding record here. Because the last record of the block is row 13, that is, row 14th, it is also a row piece, and there is a nrid.
The nrid is 0x018083f8. c, which indicates that the block contains 33784 rows of records. And row 13 are combined into a complete row record.
In other words, there should be two rows piece for the record we reported earlier. One row piece exists, and one row piece exists in the 33784 block.
However, oracle reported the preceding error because the row piece cannot be found.
In fact, after this error is encountered, we usually think it is the index problem, which can be solved through the drop reconstruction. However, the problem here is quite special. It is said that there is a problem with the table data.
This is why the customer reports an error when rebuilding the index:
SQL> CREATE INDEX "STATS". "MONTHINDEX_STATUS2" ON "STATS". "T_REPORT_MONTH" ("TARGET_298", "UNIT_LEVEL", "TARGET_VAL", "MONTH_FLG ")
2 TABLESPACE "STATDATA ";
Create index "STATS". "MONTHINDEX_STATUS2" ON "STATS". "T_REPORT_MONTH" ("TARGET_298", "UNIT_LEVEL", "TARGET_VAL", "MONTH_FLG ")
*
Row 3 has an error:
ORA-00600: Internal error code, parameter: [kdsgrp1], [], [], [], [], [], [], []
Finally, we understand all the reasons, so it is very easy to solve the problem. You can skip the problematic record by using rowid, retrieve other data, and recreate the table.