After the AIX operating system patch is installed, the ASMhang disk cannot be mounted.

Source: Internet
Author: User
My colleague's set of databases is in the AIX system. After the OS patch is installed, the ASM instance cannot mount the disk group at startup, and the hang occurs. Then we made an hanganalyze for the hang process. The hang analysis file is as follows: Chainsmostlikelytohavecausedthehang: [a] Chain1Signature: rdbm

My colleague's set of databases is in the AIX system. After the OS patch is installed, the ASM instance cannot mount the disk group at startup, and the hang occurs. Then we made an hang analyze for the hang process. the hang analysis file is as follows: Chains most likely to have caused the hang: [a] Chain 1 Signature: = 'rdbm

My colleague's set of databases is in the AIX system. After the OS patch is installed, the ASM instance cannot mount the disk group at startup, and the hang occurs. Then we made an hang analyze for the hang process. The hang analysis file is as follows:

Chains most likely to have caused the hang: [a] Chain 1 Signature: <='rdbms ipc reply'<='enq: DD - contention'     Chain 1 Signature Hash: 0x7bd12357 [b] Chain 2 Signature: <='rdbms ipc reply'<='enq: DD - contention'     Chain 2 Signature Hash: 0x7bd12357===============================================================================Non-intersecting chains:-------------------------------------------------------------------------------Chain 1:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 1 (+asm.+asm)                   os id: 5898472              process id: 20, oracle@NG-CSDB (TNS V1-V3)              session id: 1261        session serial #: 1    }    is waiting for 'enq: DD - contention' with wait info:    {                      p1: 'name|mode'=0x44440006                      p2: 'disk group'=0x0                      p3: 'type'=0x1            time in wait: 2 min 13 sec           timeout after: never                 wait id: 10                blocking: 0 sessions             current sql: ALTER DISKGROUP WPSDG MOUNT  /* asm agent *//* {0:0:324} */             short stack: ksedsts()+360<-ksdxfstk()+44<-ksdxcb()+3384<-sspuser()+116<-48b8<-sskgpwwait()+32<-skgpwwait()+180<-ksliwat()+11032<-kslwaitctx()+180<-ksqcmi()+5604<-ksqgtlctx()+2904<-ksqgelctx()+660<-kfgUseDmt()+424<-kfxdrvMount()+1092<-kfxdrvEntry()+2296<-opiexe()+18568<-opiosq0()+2864<-kpooprx()+316<-kpoal8()+692<-opiodr()+720<-ttcpip()+1028<-opitsk()+1508<-opiino()+940<-opiodr()+720<-opidrv()+1132<-sou2o()+136<-opimai_real()+608<-ssthrdmain()+268<-main()+204<-__start()+112            wait history:              * time between current wait and wait #1: 0.000979 sec              1.       event: 'SQL*Net message from client'                 time waited: 0.000099 sec                     wait id: 9               p1: 'driver id'=0x62657100                                              p2: '#bytes'=0x1              * time between wait #1 and #2: 0.000002 sec              2.       event: 'SQL*Net message to client'                 time waited: 0.000000 sec                     wait id: 8               p1: 'driver id'=0x62657100                                              p2: '#bytes'=0x1              * time between wait #2 and #3: 0.000021 sec              3.       event: 'SQL*Net message from client'                 time waited: 0.000815 sec                     wait id: 7               p1: 'driver id'=0x62657100                                              p2: '#bytes'=0x1    }    and is blocked by => Oracle session identified by:    {                instance: 1 (+asm.+asm)                   os id: 5112446              process id: 18, oracle@NG-CSDB (TNS V1-V3)              session id: 1135        session serial #: 1    }    which is waiting for 'rdbms ipc reply' with wait info:    {                      p1: 'from_process'=0xd                      p2: 'timeout'=0x7fffff7b            time in wait: 1.626263 sec           timeout after: 0.373737 sec                 wait id: 86                blocking: 2 sessions             current sql: ALTER DISKGROUP UATDG MOUNT  /* asm agent *//* {0:0:324} */             short stack: ksedsts()+360<-ksdxfstk()+44<-ksdxcb()+3384<-sspuser()+116<-48b8<-sskgpwwait()+32<-skgpwwait()+180<-ksliwat()+11032<-kslwaitctx()+180<-kslwait()+112<-ksarcr()+348<-ksbwcoex()+136<-kfgbSendWithPin()+564<-kfgbSendShallow()+244<-kfgDiscoverShallow()+392<-kfgGlobalOpen()+388<-kfgDiscoverDeep()+332<-kfgDiscoverGroup()+1160<-kfgFinalizeMount()+1236<-kfgscFinalize()+1552<-kfgForEachKfgsc()+272<-kfgsoFinalize()+200<-kfgFinalize()+500<-kfxdrvMount()+3436<-kfxdrvEntry()+2296<-opiexe()+18568<-opiosq0()+2864<-kpooprx()+316<-k            wait history:              * time between current wait and wait #1: 0.000040 sec              1.       event: 'rdbms ipc reply'                 time waited: 2.000011 sec                     wait id: 85              p1: 'from_process'=0xd                                              p2: 'timeout'=0x7fffff7d              * time between wait #1 and #2: 0.000045 sec              2.       event: 'rdbms ipc reply'                 time waited: 2.000012 sec                     wait id: 84              p1: 'from_process'=0xd                                              p2: 'timeout'=0x7fffff7f              * time between wait #2 and #3: 0.000038 sec              3.       event: 'rdbms ipc reply'                 time waited: 2.000010 sec                     wait id: 83              p1: 'from_process'=0xd                                              p2: 'timeout'=0x7fffff81    }    and is blocked by => Oracle session identified by:    {                instance: 1 (+asm.+asm)                   os id: 4653624              process id: 13, oracle@NG-CSDB (RBAL)              session id: 820        session serial #: 1    }    which is not in a wait:    {               last wait: 2 min 13 sec ago                blocking: 3 sessions             current sql:              short stack: ksedsts()+360<-ksdxfstk()+44<-ksdxcb()+3384<-sspuser()+116<-48b8<-skgfrdevstat()+444<-skgfrdscdevs()+1100<-ksfddscdevs()+24<-kfk_dscv_ufs_init()+208<-kfk_discover_disks()+564<-kfdDiscoverString()+100<-kfdDiscoverShallow()+1764<-kfgbDriver()+1908<-ksbabs()+2800<-ksbrdp()+2044<-opirip()+1624<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+268<-main()+204<-__start()+112            wait history:              1.       event: 'CSS operation: action'                 time waited: 0.003640 sec                     wait id: 3               p1: 'function_id'=0x43              * time between wait #1 and #2: 0.000428 sec              2.       event: 'GPnP Get Item'                 time waited: 0.091329 sec                     wait id: 2                             * time between wait #2 and #3: 0.003109 sec              3.       event: 'CSS initialization'                 time waited: 0.046782 sec                     wait id: 1                   }

We can see that the waiting event enq: DD-contention occurs during the execution of the disk group mount in our session, and this process is blocked by the rbal process of asm, the historical RBAL wait condition is 'css initialization' => 'gpnp Get item' => 'css operation: action'. The hang analysis file also contains the stack information of the RBAL process. In general, we need to get the relevant stack to support for a search, so I found a document, which is exactly the same as waiting for events and stack information. The document content is as follows: ASM Instance Hangs During The Diskgroup Mount Stage After aix OS Patch Install (Document ID 1633273.1 ). Let's take a look at some of the descriptions in this document. This is caused by a bug 16288155, but it has been fixed on 11.2.0.3. But it is strange that our version is 11.2.0.3. In this document, we need to collect the truss information of the RBAL process, as shown below:

1. Create a script called rbal.sh on /home/oracle directory. Add the following lines:#!/bin/csh/usr/bin/truss -o /tmp/rbal_strace.log -aefldD /bin/oraclechmod +x /home/oracle/rbal.sh2.sqlplus / as sysasmSQL>create pfile='/tmp/xx.ora' from spfile;3. Edit xx.ora and add the following parameter_dbg_proc_startup=TRUE4. Shutdown ASM5. Startup ASM using the following commands:# csh# setenv RBAL /home/oracle/rbal.sh# sqlplus / as sysasm# SQL> startup pfile='/tmp/xx.ora'

We collected the truss process according to the requirements of the document. The content is as follows:

3866976: 20447683: 0.0002: statx("/oracle/app/grid_base/diag/asm/+asm/+ASM/alert/log.xml", 0x0FFFFFFFFFFF7FD0, 176, 01) = 0 3866976: 20447683: 0.0002: statx("/oracle/app/grid_base/diag/asm/+asm/+ASM/alert/log.xml", 0x0FFFFFFFFFFF8100, 176, 0) = 0 3866976: 20447683: 0.0004: kopen("/oracle/app/grid_base/diag/asm/+asm/+ASM/alert/log.xml", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP) = 7 3866976: 20447683: 0.0002: kfcntl(7, F_SETFD, 0x0000000000000001) = 0 3866976: 20447683: kwrite(7, " < m s g t i m e = ' 2".., 283) = 283 3866976: 20447683: 0.0002: close(7) = 0 3866976: 20447683: 0.0002: statx("/oracle/app/grid_base/diag/asm/+asm/+ASM/alert/log.xml", 0x0FFFFFFFFFFF7620, 176, 0) = 0 3866976: 20447683: 0.0002: kopen("/oracle/app/grid_base/diag/asm/+asm/+ASM/trace/alert_+ASM.log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP) = 7 3866976: 20447683: 0.0002: kfcntl(7, F_SETFD, 0x0000000000000001) = 0 3866976: 20447683: kwrite(7, " W e d J u l 2 3 0".., 25) = 25 3866976: 20447683: kwrite(7, " R B A L s t a r t e d".., 41) = 41 <<<<<<<<<< 3866976: 20447683: 0.0002: close(7) = 0 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) (sleeping...) 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) = 1 3866976: 20447683: 1.0002: kopen("/proc/4653566/psinfo", O_RDONLY|O_LARGEFILE) = 7 3866976: 20447683: kread(7, "\0 \0010204\001\0\0\001".., 448) = 448 3866976: 20447683: 0.0002: close(7) = 0 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) (sleeping...) 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) = 1 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) (sleeping...) 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) = 1 3866976: 20447683: 2.0001: _thread_wait(1152921504606830016) (sleeping...) 

We can see that RBAL has been started and is always in Sleeping state. It proves that the RBAL process has no problems. The MOS document shows a problem with RBAL access to the disk. Our situation is different. So what is the problem? This is really strange. We can only roll back the OS patch and re-relink the software according to the final solution described in this article. There is no special progress in opening the SR for this problem, and it is suspected that it is a conflict that has not yet been found. After all, Oracle software is developed first, while IBM's new patches are developed later, which may cause incompatibility. It is best to install the operating system patches and then reinstall the latest database software.

Reference: ASM Instance Hangs During The Diskgroup Mount Stage After aix OS Patch Install (Document ID 1633273.1)

Original article address: After the AIX operating system patch is installed, the ASM hang disk cannot be mounted. Thank you for sharing your thanks.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.