CGS is the implementation of Oracle RAC instance management and is responsible for the following functions
1) Heartbeat mechanism between instances
2) Complete the reconfiguration of the DB cluster when the instance leaves or joins the cluster
3) solve the problem of brain fissure at the database level
1, network Heartbeat
The network heartbeat at the database level is achieved through the Lmon process, and each instance of the Lmon process periodically communicates with all remote instances through the private network of the database to confirm the status of the other instances, and if an instance is not able to respond to the network heartbeat information sent by other nodes for a period of time, Then the database cluster needs to be reconfigured, and the most intuitive information the user can see is the ora-29740 error.
2, Disk Heartbeat
Disk heartbeat at the database level and the GI level of the disk heartbeat implementation are basically the same, but because the database layer does not exist, so the implementation will be different, corresponding to the RAC database, the Lmon process will be included in the network heartbeat of the remote node status information sent to the CKPT process, The CKPT process defaults every 3s to the database's control file to write additional instance information that the local instance can access, thus completing the disk heartbeat of the DB instance. If there is a problem with the disk heartbeat of the DB instance, the most intuitive information the user can see is the ora-494 error.
3, local Heartbeat
The LMHB process periodically monitors lmon,lms,lmd,lck0 for important background processes related to cache fusion, and if the LMHB process sends an important background process that does not update its state information for a period of time, it will analyze If it is necessary to resolve the problem by restarting the local node
reconfigured phase
Phase 1: Reconfigure the primary node to send a reconfiguration message to all other nodes, and then freeze the name service and lock-related information for each instance.
Phase 2: Determine the new instance state bitmap (at this stage, the reconfiguration master will hold the RR lock to determine the new instance state bitmap), if the reconfiguration is caused by the instance leaving, the instance is resumed, and then the incarnation of the DB cluster is updated.
Phase 3: If the reason for the reconfiguration is that the instance leaves the DB cluster, the data in the name service that leaves the instance of the DB cluster is deleted.
Phase 4: Republish the Name Service information for the DB cluster and restore the previously frozen lock-related operations.
Phase 5: Previously frozen resources are unfrozen and notified that GCs and GES begin to perform memory fusion-related reconfiguration.
Types of reconfiguration
Type 1: Reconfiguration due to database startup or shutdown
Type 2: Reconfiguration due to loss of network heartbeat for one instance
Type 3: Reconfiguration due to loss of disk heartbeat for one instance
Type 4: Due to a memory fusion related to the important background process lost local heartbeat caused by reconfiguration
The result of a reconfiguration of type 2-4 is that the problematic instance is restarted
Brain fissures at the database level
1) There was a problem with the private network between two instances, and after some time (default 300s), two instances found that they could not communicate with each other.
2) Each instance tries to obtain the RR lock, obtains the instance state in the instance access control file of the RR lock, determines the new cluster instance list, obtains the instance of the RR lock to survive, and another instance is evicted.
The meaning of the brain fissure and GI level at the database level is consistent, but there are differences in treatment: GI level is based on the node number, and the database level of the brain fissure needs to compete for RR lock.
Oracle RAC Instance Management (Cluster Group Service)