A system has been killed frequently recently. The symptom is that all the CPU resources of the system are exhausted. The system environment is as follows:
Dual CPU and RAID 5
Oracle 8.1.7.0.1
Weblogic 1, 8.12
JAVA JDBC
Oracle and weblogic run on RedHat 7.3 respectively! I don't know what it is!
The character set is ZHS16GBK.
The system will occupy 99% of the CPU at intervals, and the two CPUs are all 0 idle. At the same time, events in v $ session_wait are SQL * Net more data to client or SQL * Net message from client, alert. everything in the log is normal, and no exceptions are found in the StatsPack Report, but the SQL * Net event waits a lot! At first, I thought it was because of a problem with the Intranet switch and the network causes database problems?
Later, the analysis found that when the Web Client queries a table's specific records, the CPU usage rose sharply, but the Web did not find this record, but this record does exist in the Database Table! If only one Web server is exhausted when querying this record, if multiple Web servers are enabled to query this record, for example, if five are enabled, each of the five server processes occupies about 20% of the CPU on average, at the same time, the two CPUs are exhausted and the system goes down! However, it is not normal to query this record in SQLPLUS!
The test for a number is as follows (X is the hidden part ):
Now query the 130XXXXXXXX number on the Web Client, and then the TOP shows:
1:22pm up 1 day, 10:15, 4 users, load average: 0.98, 0.50, 0.33
106 processes: 104 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 15.0% user, 6.0% system, 0.0% nice, 78.4% idle
CPU1 states: 90.3% user, 1.4% system, 0.0% nice, 7.2% idle
Mem: 1031044K av, 1021448K used, 9596K free, 0K shrd, 46060K buff
Swap: 2097112K av, 37424K used, 2059688K free 860096K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
5309 oracle 25 0 8832 8832 8252 R 99.0 0.8 0:59 oracle
3026 oracle 16 0 138M 138M 137M S 2.9 13.7 27:41 oracle
3028 oracle 16 0 144M 144M 143M S 2.5 14.3 28:59 oracle
3030 oracle 16 0 129M 129M 129M S 2.1 12.8 23:25 oracle
5193 oracle 15 0 1092 1092 856 S 2.1 0.1 1:39 top
5310 oracle 15 0 1092 1092 856 R 2.1 0.1 0:06 top
1234 oracle 15 0 136M 136M 135M S 0.3 13.5 1:27 oracle
3020 oracle 15 0 26004 25M 25380 S 0.3 2.5 3:48 oracle
1199 oracle 15 0 11788 11M 11256 S 0.1 1.1 0:13 oracle
1219 oracle 15 0 161M 161M 159M S 0.1 16.0 0:47 oracle
1 root 15 0 476 432 416 S 0.0 0.0 0:05 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU1
5 root 15 0 0 0 0 SW 0.0 0.0 0:06 kswapd
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 bdflush
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd
16 root 15 0 0 0 0 SW 0.0 0.0 0:28 kjournald
95 root 16 0 0 0 0 SW 0.0 0.0 0:00 khubd
At this point:
SQL> select a. sid, status from v $ session a, v $ process B where a. paddr = B. addr and B. spid = 5309;
SID STATUS
------------------
29 ACTIVE
SQL> select sid, event, wait_time, seconds_in_wait, state from v $ session_wait where sid = 29;
SID EVENT WAIT_TIME SECONDS_IN_WAIT STATE
---------- ----------------------------- ---------- --------------- -------------------
29 SQL*Net more data to client -1 56 WAITED SHORT TIME
In SQLPLUS:
SQL> select rowid, tel, name from teluser_tbl where tel = '130xxxxxxxx ';
ROWID TEL NAME
-----------------------------------------------------------
AAAGHgAAFAAAU27AAA 130 xxxxxxxx xxx South China Engineering Co., Ltd.
Apparently, the "Company" is missing"
You can use PL/SQL Developer's "SQL window" to query. "company" is a garbled code:
TEL NAME
-----------------------------------------
130 xxxxxxxx xxx South China Engineering Co., Ltd.
Query in the "command window" of PL/SQL Developer. What is "after" public "? "
The problem lies in the NAME field! If you delete the garbled code behind "public", there will be no problem! The system is all right!
I used dbms_system.set_ SQL _trace_in_session to track the session with PID = 5309. No trace file is generated, that is, the SQL statement has been executed and data is being returned to the Web. The Web still cannot receive the data, server process is suspended and occupies a large amount of CPU resources.
I open seven IE Windows to query the same mobile phone number, TOP:
7:46pm up 16:40, 4 users, load average: 7.81, 7.77, 7.62
105 processes: 96 sleeping, 9 running, 0 zombie, 0 stopped
CPU0 states: 96.4% user, 3.1% system, 0.0% nice, 0.0% idle
CPU1 states: 99.0% user, 1.0% system, 0.0% nice, 0.0% idle
Mem: 1031044K av, 1021452K used, 9592K free, 0K shrd, 36812K buff
Swap: 2097112K av, 42588K used, 2054524K free 881416K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
3438 oracle 25 0 9592 9592 8988 R 32.7 0.9 21:29 oracle
3431 oracle 25 0 11900 11M 11300 R 32.3 1.1 21:57 oracle
3436 oracle 25 0 9136 9136 8552 R 30.9 0.8 20:51 oracle
3454 oracle 25 0 11916 11M 11260 R 24.5 1.1 14:39 oracle
3433 oracle 25 0 25028 24M 24376 R 23.5 2.4 16:06 oracle
3442 oracle 25 0 9092 9092 8508 R 23.5 0.8 15:51 oracle
3452 oracle 25 0 9056 9056 8472 R 23.5 0.8 14:36 oracle
All the events of these seven processes are: SQL * Net more data to client
The WEB end does not have any current situation, that is, data cannot be transmitted to the WEB !!!
The corresponding SQL is very simple, the execution plan is good, the cost is also very low, I caught the SQL with TOP-PID + v $ session, v $ process, v $ sqlarea, the execution plan is as follows:
Select rowid, t. tel, t. password, t. language, t. isfree, t. status,
T. enddate, t. endprocess, t. province, t. city, t. code, t. name,
T. address, t. postcode, t. companytel, t. familytel, t. fax, t. qq,
T. email, t. regdate, t. unregdate, t. logincount
From teluser_tbl t where t. tel = '133xxxxxxxx'
Elapsed: 00:00:00. 45
Execution Plan
----------------------------------------------------------
0 select statement Optimizer = CHOOSE
1 0 table access (by index rowid) OF 'teluser _ tbl'
2 1 INDEX (unique scan) OF 'sys _ C001736 '(UNIQUE)
Statistics
----------------------------------------------------------
0 recursive cballs
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1628 bytes sent via SQL * Net to client
425 bytes encoded ed via SQL * Net from client
2 SQL * Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Obviously, this is a good execution plan.
Dump the block of this record:
SQL> select file_id from dba_extents where segment_name = 'teluser _ TBL ';
FILE_ID
---------------
5
SQL> set serveroutput on
SQL> declare
2 a number;
3 begin
4 a: = dbms_rowid.rowid_block_number ('aaaghgaafaaau27aaa ');
5 dbms_output.put_line ();
6 end;
7/
85435
The PL/SQL process is successfully completed.
SQL> declare
2 a number;
3 begin
4 a: = dbms_rowid.rowid_row_number ('aaaghgaafaaau27aaa ');
5 dbms_output.put_line ();
6 end;
7/
0
The PL/SQL process is successfully completed.
SQL> alter system dump datafile 5 blocks 85435;
System altered.
The TRACE file is as follows (only row 0 is pasted ):
[Oracle @ sxdb udump] $ less ora_5301.trc
/Opt/ora/admin/sxit/udump/ora_5301.trc
Oracle8i Enterprise Edition Release 8.1.7.0.1-Production
With the Partitioning option
JServer Release 8.1.7.0.1-Production
ORACLE_HOME =/opt/ora/8.1.7
System name: Linux
Node name: sxdb
Release: 2.4.18-3smp
Version: #1 SMP Thu Apr 18 07:27:31 EDT 2002
Machine: i686
Instance name: sxit
Redo thread mounted by this instance: 1
Oracle process number: 45
Unix process pid: 5301, image: oracle @ sxdb (TNS V1-V3)
* ** 13:11:49. 539
* ** Session id: (19.25087) 13:11:49. 538
Start dump data blocks tsn: 4 file #: 5 minblk 85435 maxblk 85435
Buffer tsn: 4 rdba: 0x01414dbb (5/85435)
Scn: 0x0000. 058b81ce seq: 0x01: 0x02 tail: 0x81ce0601
Frmt: 0x02 chkval: 0x0000 type: 0x06 = trans data
Block header dump: 0x01414dbb
Object id on Block? Y
Seg/obj: 0x61e0 csc: 0x00. 58b81cd itc: 1:-typ: 1-DATA
Fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 xid: 0x0003. 031.20.aa4c uba: 0x00c003de. 10f4. 48 -- U-1 fsc 0x0000. 058b81ce
Data_block_dump
====================
Tsiz: 0x1fb8
Hsiz: 0x58
Pbl: 0x09724644
Bdba: 0x01414dbb
Flag = -----------
Ntab = 1
Nrow = 35
Frre =-1
Fsbo = 0x58
Fsearch = 0x312
Avsp = 0x389
Tosp = 0x389
0xe: pti [0] nrow = 35 offs = 0
0x12: pri [0] offs = 0x1edf
0x14: pri [1] offs = 0x1e13
0x16: pri [2] offs = 0x1d47
0x18: pri [3] offs = 0x1c75
0x1a: pri [4] offs = 0x1ba9
0x1c: pri [5] offs = 0x1add
0x1e: pri [6] offs = 0x1a11
0x20: pri [7] offs = 0x1945
0x22: pri [8] offs = 0x1879
0x24: pri [9] offs = 0x17ad
0x26: pri [10] offs = 0x16e2
0x28: pri [11] offs = 0x1617
0x2a: pri [12] offs = 0x154c
0x2c: pri [13] offs = 0x1481
0x2e: pri [14] offs = 0x13b6
0x30: pri [15] offs = 0x12eb
0x32: pri [16] offs = 0x1220
0x34: pri [17] offs = 0x1155
0x36: pri [18] offs = 0x0000c
0x38: pri [19] offs = 0xfc1
0x3a: pri [20] offs = 0xef6
0x3c: pri [21] offs = 0xe2d
0x3e: pri [22] offs = 0xd62
0x40: pri [23] offs = 0xc99
0x42: pri [24] offs = 0 xbce
0x44: pri [25] offs = 0xb05
0x46: pri [26] offs = 0xa39
0x48: pri [27] offs = 0x96d
0x4a: pri [28] offs = 0x8a5
0x4c: pri [29] offs = 0x7d9
0x4e: pri [30] offs = 0x70d
0x50: pri [31] offs = 0x645
0x52: pri [32] offs = 0x57d
0x54: pri [33] offs = 0x3e2
0x56: pri [34] offs = 0x312
Block_row_dump:
Tab 0, row 0, @ 0x1edf
Tl: 217 fb: -- H-FL -- lb: 0x0 cc: 50
Col 0: [11] 31 33 30 3X 3X 3X 3X 3X 3X 3X (X-is hidden by me to avoid leaking customer information)
Col 1: [6] 32 30 30 35 30 31
Col 2: [4] b9 e3 b6 AB
Col 3: [4] b7 f0 c9 bd
Col 4: [4] 30 37 36 35
Col 5: [19] c2 b7 c7 c5 bb aa c4 cf b9 a4 b3 cc d3 d0 cf de b9 AB cb -- that is, the NAME field stores 19 bytes in total, the NAME of this line of record is "XX South China Engineering Co., Ltd.", a total of 9 characters, obviously the database stores one more byte! This byte causes garbled characters!
Col 6: * NULL *
Col 7: * NULL *
Col 8: * NULL *
Col 9: * NULL *
Col 10: * NULL *
Col 11: * NULL *
Col 12: * NULL *
Col 13: [14] 32 30 35 30 31 30 35 31 31 37 31 34 33 32
Col 14: [14] 32 30 35 30 31 30 35 31 31 37 31 34 33 32
Col 15: [2] c1 02
Col 16: [1] 80
Col 17: [1] 80
Col 18: [2] c1 02
Col 19: [14] 32 30 35 30 31 30 35 31 31 37 31 34 33 32
Col 20: [14] 32 30 35 30 31 30 35 31 31 37 31 34 33 32
Col 21: [1] 80
Col 22: [1] 80
Col 23: [2] c1 04
Col 24: [1] 80
Col 25: [2] c1 06
Col 26: [2] c1 02
Col 27: [4] 30 30 30 30
Col 28: [1] 80
Col 29: * NULL *
Col 30: [14] 32 30 35 30 31 30 35 31 31 37 31 34 33 32
Col 31: [1] 80
Col 32: * NULL *
Col 33: [1] 80
Col 34: [1] 80
Col 35: * NULL *
Col 36: * NULL *
Col 37: * NULL *
Col 38: * NULL *
Col 39: * NULL *
Col 40: * NULL *
Col 41: [1] 80
Col 42: [1] 80
Col 43: [1] 80
Col 44: [1] 80
Col 45: [1] 80
Col 46: [1] 80
Col 47: [1] 80
Col 48: [14] 32 30 35 30 33 30 37 31 37 31 34 33 32
Col 49: [2] c1 0c
Because this field is not very important, update teluser_tbl set name = rtrim (name, substr (name,-1) is used at last. The character at the rightmost end is broken and the system returns to normal.
Later, the developer checked the program in detail and found that the C language did not read one byte when reading data!
So why does this Byte cause such a serious problem? The only explanation currently is the Bug in the ZHS16GBK character set. The character corresponding to this Byte in the ZHS16GBK character set conflicts with the underlying OS communication code, it causes communication faults and database faults.
Come here first, and try again later.