The heartbeat timeout detection of ASM -- Delayed asm pst heart beats

Source: Internet
Author: User

The heartbeat timeout detection of ASM -- Delayed asm pst heart beats

Recently, we have received the dismount of the ASM disk and the error "Waited 15 secs for write IO to PST". This is the unique heartbeat timeout detection of ASM, the ASM instance regularly checks whether each asm disk can provide normal feedback. So I decided to make a small summary of this problem.

The following section describes the ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (Doc ID 1581684.1:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generally this kind messages comes in ASM alertlog file on below situations,

Delayed asm pst heart beats on ASM disks in normal or high redundancy diskgroup,
Thus the ASM instance dismount the diskgroup. By default, it is 15 seconds.

By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
But the heart beat delays do not dismount external redundancy diskgroup directly.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As described above, it can be understood as the following:
1. The ASM instance regularly checks the disk status of each disk group and whether the communication is normal;
2. This check only applies to normal and high redundancy modes. This error will not be encountered for external redundancy;
3. By default, the time-out period is 15 seconds. That is to say, if the 15 s disk group still does not respond to the ASM instance, the disk is dismounted.

Customers who encounter this problem all use fiber-optic network storage. This error occurs when the storage network is faulty. That is to say, when ASM regularly sends inspection information, if the disk does not provide feedback within 15 seconds, I think the disk is no longer accessible.
In response to this error, I tried to test in the test environment. Because the test environment is a VMware Virtual Machine, deleting a disk on the physical layer will not cause this problem. The reason is that after the disk on the same host is deleted abnormally, the ASM read operation will immediately return the system-level IO error, you do not need to wait for the timeout of the error "Waited 15 secs for write IO to PST.

Therefore, in summary, this error occurs only on the shared ASM disk, not on the local host, but on the Storage Network. The detection information sent by ASM cannot be reported in a timely manner, this error occurs. At this time, it may be the storage host, storage network, or even storage disk problems. anyway, I asm did not receive the confirmation information I needed. I think you have a problem, if the number of disks is too large to affect data integrity, then ASM will dismount the disk.

The error message "Waited 15 secs for write IO to PST" appears after 11.2.0.3.0 according to section 1581684.1. The document also describes how to manually modify the detection timeout time, which can be controlled by the parameter _ asm_hbeatiowait:

Alter system set "_ asm_hbeatiowait" = <value> scope = spfile sid = '*';

<The modification takes effect when you need to restart ASM/CRS.>

To confirm that this parameter is displayed after 11.2.0.3, I will query all the database versions. For details, refer to the following information:
==================================== 10.2 ======================== ====
SQL> select * from v $ version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0-Prod
PL/SQL Release 10.2.0.5.0-Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0-Production
NLSRTL Version 10.2.0.5.0-Production

SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% undo %' order by ksppinm;
Hidden parameter value
------------------------------------------------------------------------------------------
_ Asm_acd_chunks 1
_ Asm_allow_only_raw_disks TRUE
_ Asm_allow_resilver_0000uption FALSE
_ Asm_ausize 1048576
_ Asm_blksize 4096
_ Asm_direct_con_expire_time 120
_ Asm_disk_repair_time 14400
_ Asm_droptimeout 60
_ Asm_emulmax 10000
_ Asm_emultimeout 0
_ Asm_fob_tac_frequency 3
Hidden parameter value
------------------------------------------------------------------------------------------
_ Asm_instlock_quota 0
_ Asm_kfdpevent 0
_ Asm_libraries ufs
_ Asm_maxi/o 1048576
_ Asm_skip_resize_check FALSE
_ Asm_stripesize 131072
_ Asm_stripewidth 8
_ Asm_wait_time 18
_ Asmlib_test 0
_ Asmsid asm
21 rows selected.

==================================== 11.2.0.1 ====================== ====
Sqlplus/as sysdba
Connected:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% asm_hb %' order by ksppinm;
Hidden parameter value
--------------------------------------------------------------------------------
_ Asm_hbeatwaitquantum 2

==================================== 11.2.0.2 ========================== ====
$ Sqlplus/as sysdba
Connected:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0-64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
And Real Application Testing options
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% asm_hb %' order by ksppinm;
Hidden parameter value
--------------------------------------------------------------------------------
_ Asm_hbeatwaitquantum 2

This parameter is available only after 11.2.0.3.0. That is to say, the ASM instance checks disk timeout only after 11.2.0.3.
==================================== 11.2.0.3 ========================== ====
Sys @ R11203> select * from v $ version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-64bit Production
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% undo %' order by ksppinm;
Hidden parameter value
Hidden parameter value
----------------------------------------------------------------------
_ Asm_hbeatiowait 15
_ Asm_hbeatwaitquantum 2

=================================== 11.2.0.4 ============================ ====
SQL> select * from v $ version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-Production
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% undo %' order by ksppinm;
Hidden parameter value
-----------------------------------------------------------------------------------------
_ Asm_hbeatiowait 15 <
_ Asm_hbeatwaitquantum 2

===================================12.1.0.1 ================= ====
$ Sqlplus/as sysdba
Connected:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0-64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% asm_hb %' order by ksppinm;
Hidden parameter value
--------------------------------------------------------------------------------
_ Asm_hbeatiowait 15
_ Asm_hbeatwaitquantum 2

After 12.1.0.2, the default value of this parameter is adjusted to 120 s.
===================================12.1.0.2 ===================== ====
$ Sqlplus/as sysdba

Connected:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0-64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x $ ksppi join x $ ksppcv using (indx) where ksppinm like '\ _ %' escape '\' and ksppinm like '% asm_hb %' order by ksppinm;
Hidden parameter value
--------------------------------------------------------------------------------
_ Asm_hbeatiowait 120
_ Asm_hbeatwaitquantum 2

I hope this knowledge point will be helpful to you. In daily life, I often lament that this problem is very simple, but it is not sure. After the test, record it for query.

How Does Oracle ASM Add a new disk to a disk?

Oracle 10g manual creation of the ASM Database

Solutions to various problems after installing Oracle 11gR2 in Ubuntu 12.04 (amd64)

How to change the sys password of Oracle 10g ASM

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.