Database Brain fissure

Source: Internet
Author: User

Tag:membership   network   master   oracle    Database    

ORACLE RAC CSS provides 2 background services including group management (Group managment GM) and Node monitoring (Node monitor), where the GM Management Group (group) and lock ( Lock) service. At any point in the cluster, there is always a node that acts as a GM Master node (master node). Other nodes in the cluster send the GM request serially to the master node (Master node), while Master node broadcasts the cluster member change information to the other nodes in the cluster. The group membership Relationship (Group membership) is synchronized each time a cluster reset (cluster reconfiguration) occurs. Each node interprets the change information of cluster members independently. The node monitoring NM service is responsible for maintaining node information consistency through SKGXN (SKGXN-LIBSKGXN.A, a library that provides node monitoring) and other vendors ' cluster software. In addition NM provides maintenance of our well-known network Heartbeat (network heartbeat) and disk Heartbeat (Disk heartbeat) to ensure that the nodes are always alive. When a cluster member does not have a normal network heartbeat or disk heartbeat, NM is responsible for kicking the member out of the cluster, and the node that is kicked out of the cluster will be restarted (reboot). The NM service uses the records in OCR (Interconnect information recorded in OCR) to understand the endpoints it needs to listen and interact with, sending heartbeat information over the network to other cluster members. It also monitors network heartbeat network heartbeat from all other cluster members, which can occur every second of the network heartbeat, if a node's network heartbeat is Misscount (By the way : The default Misscount on Linux in 10.2.0.1 is 60s, the other platform is 30s, if third party vendor clusterware is used 600s, But 10.2.0.1 was not introduced disktimeout;10.2.0.4 after misscount for 60s,disktimeout for 200s;11.2 after Misscount for 30s:crs-4678:  Successful get misscount 30 for cluster synchronization services,crs-4678: successful get disktimeout 200 for cluster  Synchronization services) is not received in the specified number of seconds, the node is considered "dead". NM is also responsible for initializing the cluster's reset (initiates cluster reconfiguration) when other nodes join or leave the cluster. In the case of brain fissures, NM also monitors voting disk to learn about other competing subsets of the population (subclusters). About subset groups we need to introduce, imagine our environment there are a large number of nodes, Oracle officially built 128 node environment for our imagination space, when the network failure occurs when there are many possibilities, one possibility is the global network failure, That is, each node in 128 nodes cannot have a network heartbeat with each other, which results in up to 128 information "silos" subsets. Another possibility is a partial network failure, in which 128 nodes are divided into multiple parts, each containing more than one node, which can be called a subset group (subclusters). When there is a network failure, multiple nodes within the subset can still communicate with each other to transmit the voting information (VOTE MESG), but the subset or the island node has not been able to communicate through the regular interconnect network, this time nm  Reconfiguration will need to use the Voting disk voting disk. Because NM is going to use Voting disk to resolve communication barriers due to network failures, it is necessary to ensure that voting disk can be accessed normally at any time. In the normal state, each node will be the disk heartbeat activity, specifically, will be to the voting disk on a block to write disk heartbeat information, this activity occurs every second, and the CSS will read a "Kill block" in a second, called "a dead Block", when "kill  block "content indicates that when this node is evicted from the cluster, the CSS will actively restart the node. In order to ensure that the above disk heartbeat and read "Kill block" activities are always working properly CSS requires that at least (n/2+1) voting disks be properly accessed by the nodes, ensuring that at least one polling disk is always available to each of the 2 nodes and that they are normally accessible. Under normal circumstances (note the normal condition of calm) as long as the node canAccess to the online voting disk more than the unreachable Voting disk, the node can be happy to live, when unable to access more than the normal Voting disk Voting disk, Cluster The  communication service process fails and causes the node to restart. So there is a saying that voting disk as long as there are 2 enough to ensure redundancy, there is no need to have 3 or more voting disk, this argument is wrong. Oracle recommends that you have at least 3 voting disks in a cluster. Supplementary 1:question: Some classmates ask so voting disk   must be an odd number? Answer: In fact, we only recommend using odd numbers of vote disk&nbsp, rather than having to be an odd number. The maximum number of Vote disk in 10GR2 is 32. Question Can we use 2 or 4 Vote disk? Answer: Yes.   However, the number 2 and 4 is unfavorable under the disk heartbeat hard algorithm of "at least (n/2+1) a polling disk to be accessed by the node": When we use 2 vote disk , No vote disk heartbeat failure can occur when we use 3 vote disk , we cannot have more than 1 vote disk heartbeat failures When we use 4 vote disk  , can not occur more than 1 vote disk heartbeat failure   This is the same as 3 fault tolerance, but because we have more vote disk, this can lead to management costs and the risk of introducing growth when we use 5 Vote disk  , can not occur more than 2 vote disk heartbeat failure When we use 6 vote disk , there is still no more than 2 vote disk heartbeat failures,  The same is because more than 5 o'clock more than a &nbsp, will also introduce unreasonable management costs and risk of supplementary 2:question: If the network heartbeat between nodes is normal, and the node can be normal heartbeat vote disk  greater than   not normal access   , such as 3 votedisk   happens to have 1 vote disk  disk heartbeat  timeout, and brain split  will happen now? Answer: This situation does not trigger brain split, nor does it cause a node eviction protocol (EVICTION PROTOCOL).   When single or less than (n/2+1) Voting disk heartbeat failure (disk heartbeat failure), this heartbeat failure may be due to short-term intra-node access voting  Disk I/o error error, the CSS will immediately mark these failed Voting disk as offline. Although there is a certain number of voting disk offline, we still have at least (n/2+1) a voting disk available, which guarantees that Eviction protocol will not be called, so no nodes will be restarted by reboot. The Disk ping monitor thread (DPMT-CLSSNMDISKPMT) of the Node monitor module will then repeatedly attempt to access these failed offline  Voting disk, if these voting disks become again I/O accessible and the data on it is verified to be corruption, the CSS will again mark this voting disk as online, but if at 45s (  Here the 45s is based on the Misscount and   internal algorithm obtained)   still does not have normal access to the relevant Voting disk, then DMPT will generate a warning message in Cssd.log

The continuous casting server exception is due to the server's node error (RAC is described below), the node error is mainly 2 aspects 1; for the split brain phenomenon 2 node loses more than half of vote disk connection cause node error;
1 Causes of brain fissure:
Server cannot connect, ora alarm: SELECT a.owner, A.object_name, b.session_id,
B.oracle_username, B.os_user_name, B.process,
B.locked_mode, C.sid,
c.serial#, C.program
From All_objects A, V$locked_object B, SYS. Gv_$session C
WHERE (a.object_id = b.object_id) and (b.process = c.process)
and A.object_name= ' Tab_name ';
And then kill the thread.
Alter system kill session ' sid,serial# ' immediate;
This problem is mainly caused by another ORA-03135, the main reason is the recent network instability, such as the cluster of any host public port break will cause the VIP to disappear, causing false down the situation is called split brain;
Split-brain phenomenon: the phenomenon of inconsistencies in clusters caused by the inability of normal communication between nodes in a cluster
If this occurs, Oracle RAC terminates a node to ensure cluster consistency. The principle of terminating the case after splitting brain is the node of voting selection termination based on the subset group of the rest their lives of the split brain phenomenon, the number of voting rule nodes is terminated, and the node ID is small at the same time. Node: (real application clusters called RAC) simply said that Ora provides a simple application platform (BBS-like), supporting all types of applications, whether transactional or analytical applications. All apps share the same server and storage resources. Any server or disk failure occurs and the system automatically re-takes over the failed function. Clusters: Clusters are separate computers that provide services as a whole. Continuous casting as a column. DB1 and DB2 and disk arrays form a cluster (via software) that wants the app to provide data services.
The appeal can be seen that the split-brain phenomenon can lead to service crashes (the thread is killed), causing the split brain to be mostly multiple node data out of sync, and the appeal network is unstable for one reason.
There are also node buffer cache inconsistencies, when the cache synchronization operation will occur when the split-brain phenomenon, the server can not start, phenomenon: Two-node RAC database, can only start a node, no matter which node is first started, the other node will not start properly! , unable to start the performance is can mount,alter database open hang in that do not move, no error message, only the background process QMNC process can not start, restart the information, there are mmnl absent for 1474 secs; Foregrounds taking over the information given.


Database Brain fissure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.