Oracle RAC runs on top of the cluster, providing the highest levels of availability, scalability, and low-cost computing power for Oracle databases. If one node in the cluster fails, Oracle can continue to run on the remaining nodes. Oracle's main innovation is a technology called cache merging. Cache merging enables nodes in a cluster to efficiently synchronize their memory caches through high-speed cluster interconnection, thereby minimizing disk I/O to the minimum. The most important advantage of caching is that it enables the disk share of all the nodes in the cluster to access all data, and the data does not need to be partitioned between nodes. Oracle is the only vendor to provide an open systems database with this capability.
RAC Architecture diagram:
the core of Oracle RAC is the shared disk subsystem, where all nodes in the cluster must have access to all data, redo log files, control files, and parameter files, the data disks must be globally available, all nodes are allowed access to the database, each node has its own redo log, However, other nodes must be able to access them so that they can be recovered when a system failure occurs on that node.
I. The RAC architecture
RAC is a clustered solution for Oracle databases and is capable of coordinating operations with two or more two database nodes. As shown in the following:
The Cluster Manager (Cluster Manager) consolidates other modules in a clustered system, providing communication between cluster nodes through high-speed internal connections. Inter-node connection using the heartbeat line interconnection, the information function on the heartbeat line to determine the cluster logic node member information and node updates, and the node at a certain point in time running state, to ensure that the cluster system normal operation.
Structure composition and mechanism of RAC
- Cluster nodes (Cluster node)-2 to N nodes or hosts running Oracle Database Server.
- Private networks (network interconnect)--rac require a high-speed, interconnected private network to handle communications and cache Fusion.
- Shared Storage--rac requires a shared storage device so that all nodes have access to the data files.
- The network of external services (Production networks)--rac the network of external services. Both the client and the application are accessed through this network.
Second, RAC service process
- crs-Cluster resource Service (cluster ready services)
Basic procedures for managing highly available operations within a cluster
Any transaction managed by CRS is called a resource
Database, instance, listener, virtual IP, application process, etc.
CRS is based on resource configuration information stored in OCR to manage these resources
When the state of a resource changes, the CRS process generates an event
- css-Cluster Sync Service (Cluster Synchronization Service)
Managing Membership for cluster nodes
Control which node is the member of the cluster, the node notifies the cluster member when joining or leaving the cluster to control the cluster configuration information.
This process has failed to cause cluster restart
- EVMD Incident Management Service (event Management)
Event Management Daemon
Publish a background process for CRS creation events
ons-Event Publishing and subscription services (Oracle Notification service)
Rapid application of Communication notification event publishing and subscription services
- Ocr-oracle Cluster Register
Cluster registration file, which records information about each node
Saving various resource information for a RAC cluster
Similar to the Windows registry
stored on shared disk, shared by all instances
The default is 2 spare disks
The quorum mechanism is used to arbitrate the behavior of multiple nodes to the shared node when writing, to avoid conflicts
stored on shared disk, shared by all instances
Used to determine the relationship of each instance
When a node fails, it is determined by voting disk to expel which instance
The default is 3 spare disks
third, RAC background process
Oracle RAC has its own unique background process that does not function in a single instance. As shown, some background processes running on the RAC are defined. The functionality of these background processes is described below.
L The MS (Global cache service processes process) process is primarily used to manage access to data blocks within the cluster and to transmit block mirroring in the Buffer cache for different instances. Copy the data block directly from the cache of the controlled instance, and then send a copy to the requested instance. and ensure that the image of one block of data in the Buffer Cache for all instances can only occur once. The LMS process coordinates the access of the data block by passing messages in the instance, and when an instance requests a data block, the instance's LMD process issues a request for a block resource that points to the LMD process of the instance of the master block, the LMD process of the primary instance, and the LMD process of the instance being used to release the resource. The LMS process that owns the instance of the resource creates a consistent read of the block image and then passes the chunk to the buffer CACHE of the instance that requested the resource. The LMS process ensures that only one instance can be updated at a time and is responsible for maintaining the mirrored record of the block (the status FLAG that contains the updated data block). RAC provides 10 LMS processes (0~9), which increase as the number of message-passing data between nodes increases.
Lmon (lock monitor process, lock monitoring processes) is a global queue service monitor, each instance of the Lmon process will periodically communicate to check the health of each node in the cluster, when a node fails, responsible for cluster reconfiguration, GRD recovery and other operations, The service it provides is called the Cluster Group Service (CGS).
Lmon mainly uses two kinds of heartbeat mechanism to complete the health examination.
- Network HEARTBEAT between nodes: it can be imagined that the timing of the node sends Ping packet detection node status, if the response can be received within the specified time, it is assumed that the other state is normal.
- By controlling the file's disk heartbeat (Controlfile HEARTBEAT): Each node's CKPT process updates the data block of the control file every 3 seconds, this data block is called the Checkpoint Progress Record, the control file is shared, So the instance can check with each other whether the update is timely to judge.
LMD (the Global ENQUEUE service DAEMON, lock Management daemon) is a background process, also known as the Global Queue Service daemon, because it is responsible for managing the resource's administrative requirements to control access blocks and global queues. Within each instance, the LMD process manages the input of a remote resource request (that is, a lock request from another instance in the cluster). In addition, it is responsible for deadlock checking and monitoring conversion timeouts.
LCK (the lock process) manages non-cache fusion, and the lock request is a local resource request. The LCK process manages resource requests for instances of shared resources and calls operations across instances. During the recovery process it establishes a list of invalid lock elements and validates the elements of the lock. Because of the primary function of LMS lock management during processing, only a single LCK process exists in each instance.
DIAG (The diagnosability DAEMON, diagnostic daemon) is responsible for capturing information about process failures in a RAC environment. and write out the trace information for failure analysis, the information generated by DIAG is useful in collaborating with Oracle support technology to find the cause of the failure. Only one DIAG process is required per instance.
GSD, the Global service DAEMON, interacts with the RAC's management tools DBCA, Srvctl, and OEMs to complete management transactions such as startup shutdown of the instance.
GCS and GES Two processes are responsible for maintaining the state information of each data file and cache block through the Global Resource directory (Resource directory GRD). When an instance accesses the data and caches the data, the other instances in the cluster get a corresponding block image so that other instances can access the data without having to read the disk again, and instead read the cache directly in the SGA. GRD exists in the memory structure of each active instance, this feature causes the SGA of the RAC environment to be larger than the SGA of a single-instance database system. Other processes and memory structures are not much different from single-instance databases.
Iv. Brain Fissure
Oracle Cluster components Use voting disks to resolve cluster membership issues in a partitioned cluster.
For example, a 8-node cluster has a communication outage between its nodes, and 4 nodes cannot communicate with another 4 nodes. The voting disk helps determine which set of nodes should continue to function properly, while another set of nodes should be down.
All voting disks must be placed on shared storage that can be accessed by all nodes. Having multiple voting disks makes voting disks no longer a single point of failure, nor does it require mirroring them externally.
Oracle RAC (Real application Clusters)