Case study of downtime caused by ORA-04031 errors
Today we met an ORACLE database downtime case. The following describes the causes of this database downtime case. In the analysis process, we will record the cause and effect of this case, accumulate some experience, and cultivate the ability to analyze and solve problems. CASE Environment: Operating System: Oracle Linux Server release 5.7 64-bit Database version: Oracle Database 10g Release 10.2.0.4.0-64bit Production case analysis: When you receive an alarm to check the Database, the instance is found to be down. Check the alarm log and find the following error message:
ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov 2 11:43:00 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")Mon Nov 2 11:43:00 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov 2 11:43:05 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")Mon Nov 2 11:43:05 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov 2 11:43:08 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")Mon Nov 2 11:43:08 2015RECO: terminating instance due to error 4031Mon Nov 2 11:43:08 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:ORA-04031: unable to allocate bytes of shared memory ("","","","")Instance terminated by RECO, pid = 6569
From the alarm log we can see that ORA-00604 and ORA-04031 errors caused this downtime accident (RECO: terminating instance due to error 4031): $ oerr ora 4031 04031,000 00, "unable to allocate % s bytes of shared memory (\" % s \ ", \" % s \") "// * Cause: More shared memory is needed than was allocated in the shared // pool. // * Action: If the shared pool is out of memory, either use the // dbms_shared_pool package to pin large packages, // configure C E your use of shared memory, or increase the amount of // available shared memory by increasing the value of the // INIT. ORA parameters "shared_pool_reserved_size" and // "shared_pool_size ". // If the large pool is out of memory, increase the INIT. ORA // parameter "large_pool_size ". the general ORA-04031 error may be caused by two reasons: 1: memory has a large number of fragments, resulting in the allocation of memory, there is no continuous memory can be stored, this problem usually needs to be started from the development perspective, such as adding binding variables and reducing hard parsing to improve and avoid it; 2. memory capacity is insufficient and memory needs to be expanded. The physical memory allocated by this machine is 8 GB. The result shows that SGA only allocates 1168 MB, less than 2 GB, And the instantaneous bunker is reached. This is really speechless. ASH Report analyzes the Buffer Cache and Shared Pool sizes before and after the downtime as follows. View the tracking file and you can see the SGA: allocation forcing component growth wait event. It can be confirmed that the SGA cannot grow, that is, the SGA is cracked, combined with the ASH Report, we can see that the size of the Shared Pool was close to 69.6% of that of the SGA at that time.
SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00 (session) sid: 932 trans: (nil), creator: 0xa8a26c68, flag: (51) USR/- BSY/-/-/-/-/- DID: 0001-000A-00000003, short-term DID: 0000-0000-00000000 txn branch: (nil) oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS last wait for 'SGA: allocation forcing component growth' blocking sess=0x(nil) seq=51324 wait_time=10714 seconds since wait started=0 =0, =0, =0 Dumping Session Wait History for 'SGA: allocation forcing component growth' count=1 wait_time=10714 =0, =0, =0 for 'SGA: allocation forcing component growth' count=1 wait_time=10512 =0, =0, =0 for 'latch: shared pool' count=1 wait_time=892 address=600e7320, number=d6, tries=0 for 'latch: shared pool' count=1 wait_time=28 address=600e7320, number=d6, tries=0 for 'latch: shared pool' count=1 wait_time=51 address=600e7320, number=d6, tries=0 for 'latch: shared pool' count=1 wait_time=114 address=600e7320, number=d6, tries=0 for 'latch: shared pool' count=1 wait_time=120 address=600e7320, number=d6, tries=0 for 'latch: library cache' count=1 wait_time=33 address=a3fa46e8, number=d7, tries=1
Based on some of the above analysis, we can conclude that the unreasonable settings of SGA cause the memory of the shared pool to be fully exhausted, and the SGA is cracked. Therefore, adjusting the SGA parameters is the correct solution to the problem. In addition, considering that the database has been running normally for a long time, We also analyzed the awr and addm reports and found that the system's hard Parsing is quite serious. In addition, the following script observes the changes of the shared pool for a period of time, and finds that it shrinks and increases frequently.
SELECT start_time, component, oper_type, oper_mode, initial_size / 1024 / 1024 "INITIAL", final_size / 1024 / 1024 "FINAL", end_time FROM v$sga_resize_ops WHERE component IN ( 'DEFAULT buffer cache', 'shared pool' ) AND status = 'COMPLETE' ORDER BY start_time, component;
You can set the database parameter SHARED_POOL_SIZE to ensure that the size of SHARED_POOL_SIZE is not lower than the size due to memory shortage, in addition, you can SET the SGA resize time interval alter system set "_ memory_broker_stat_interval" = n SCOPE = SPFILE; although the problem is solved, but what really needs to be reflected is why this SGA_MAX_SIZE is set to 1168M size! And not found during inspection.