A RAC concurrency
The essence of RAC is a database running on multiple computers, the main task of which is that the database is transaction processing, and it solves the concurrency problem by distributed lock Management (DLM: Distributed lock manager). Because RAC resources are shared, in order to ensure data consistency, it is necessary to use DLM to coordinate competing access to resources between instances. The DLM of RAC is called Cache Fusion.
In DLM, resources are divided into two categories, based on the amount of resources and activity-intensive: Cache Fusion and Non-cache fusion.
Cache Fusion resource refers to data blocks such resources, including normal database, index database, segment header block (Segment header), undo database.
Non-cache Fusion Resource is all non-database block resources, including data files, control files, data dictionaries, Library cache,share pool row caches, etc. The Row Cache holds a data dictionary, which is designed to reduce access to the disk during the compilation process.
In cache fusion, each chunk is mapped to a cache fusion resource, and the cache fusion resource is actually a data structure, and the name of the resource is the data block address (DBA). Each data request action is done in a step. First, convert the block address X to the cache fusion resource name, and then submit the cache Fusion resource request to the DLM, the DLM for the global lock application, release the activity, only the process obtains the PCM lock to continue the next step, That is, the instance obtains the right to use the data block.
The first problem to be solved by the Cache fusion is that the data block copies the state distribution graph between the cluster nodes, which is implemented by GRD.
GRD (Global Resource Directory)
GRD can be thought of as an internal database, where each chunk of data is recorded in the cluster, which is located in each instance of the SGA, but each instance of the SGA is a partial grd, all instances of GRD together is a complete grd.
The RAC chooses a node from the cluster as its master node based on the name of each resource, and the other node is called Shadow node. The usage information for this resource on all nodes is recorded in the grd of Master node, while Shadow node's GRD only needs to record the usage of the resource on that node, which is actually the PCM lock information. PCM Lock has 3 properties: Mode,role and PI (past Image)
Two RAC schema
Changes in 2.1 SGA
Compared to the traditional single instance, the most significant change in the SGA Insance is one more GRD part. Data operations in Oracle are done in the SGA area of memory, unlike traditional single instances, where RAC is multiple, each chunk can have copies in any of the instance SGA, and the RAC must know the distribution version of the copies, state, and GRD is the memory area of the information.
Although the GRD is located in the SGA, but unlike the SGA components such as buffer Cache or Log buffer, there are clear parameters to correspond to, each node only partially GRD content, all the nodes together to form a complete GRD.
2.2 Changes in the background process
Each instance of a RAC, like a traditional single instance, has dbwr,lgwr,arcn,ckpt these background processes, in addition to these processes, each of which adds a number of RAC-specific processes, noting that these process names and the service names provided by the process vary widely. For example, the LMS process provides a GCS service, which is inconvenient and memorable because the process name continues from the OPS (RAC predecessor) before 9i, but the service has been redesigned and named in the RAC.
2.2.1 Lmsn
This process is the main process of Cache Fusion, which is responsible for the transfer of data blocks between instances, the corresponding service is called GCS (Global Cache Service), and the name of the process is derived from the lock Manager service. Starting with Oracle 9, Oracle renamed the Service to the global Cache service, but the process name was not adjusted. The number of this process is controlled by the parameter gcs_server_processes, the default value is 2, and the value range is 0-9.
2.2.2 LMD
This process is responsible for the global Enqueue Service (GES), which, in particular, is responsible for coordinating the order of access to data blocks across multiple instances to ensure consistent access to the data. Together with the GCS service of the LMSN process and GRD, it is the core of the RAC feature cache Fusion.
2.2.3 LCK
This process is responsible for synchronizing access to Non-cache Fusion resources, with one lck process per instance
2.2.4 Lmon
Each instance of the Lmon process periodically communicates to check the health status of each node in the cluster, and when a node fails, it is responsible for the cluster reconfiguration, GRD recovery, and the services it provides are known as: Cluster Group Services (CGS).
Lmon mainly uses two kinds of heartbeat mechanism to complete the health examination:
1) network heartbeat between nodes: You can imagine the time between the nodes of the Heartbeat send Ping Packet detection node status, if you can receive a response within a specified period, it is assumed that the other state is normal
2) by controlling the file's disk heartbeat (Controlfile Heartbeat): Each node of the CKPT process updates the control file a data block every 3 seconds, this data block is called the checkpoint Progress Record, the control file is shared, So the instances can check each other to see if they are updated in time to judge.
2.2.5 DIAG
The DIAG process monitors the health status of the instance and logs the phone diagnostic data to the Alert.log file when the instance runs out of error
2.2.6 GSD
This process is responsible for understanding the client tools, such as Srvctl receiving user commands, providing the user with a management interface.
2.3 Files
Oracle files include binary execution files, parameter files (pfile and spfile), password files, control files, data files, online logs, archive logs, backup files.
2.3.1 SPFile
This parameter file needs to be accessed by all nodes and needs to be placed on the shared storage
2.3.2 Redo Thread
There are multiple instances in the RAC environment, each of which needs to have its own set of redo log files to log. This set of Redo Log is called as a redo thread, in fact, a single instance is redo thread, but the word thread is rarely mentioned, each instance of a set of redo thread design is to avoid resource competition caused performance bottlenecks.
Redo thread has two kinds, one is private, the creation syntax: ALTER DATABASE add logfile. Thread N; The other is public, creating syntax: ALTER DATABASE add logfile ...; Each instance in the RAC is set to the thread parameter, which is the default value of 0. If this parameter is set, the instance starts with a private Redo thread that is equal to the thread. If this parameter is not set, the default value of 0 is used, and the instance is selected to use the public Redo thread after it is launched, and the instance uses the Redo thread in an exclusive manner.
Each instance in the RAC requires a redo Thread, each redo log thread requires at least two redo log group, each log group member should be of equal size, preferably with more than 2 members per group, and those members should be placed on separate disks to avoid a single point of failure.
It is important to note that in a RAC environment, the Redo Log group is numbered at the entire database level, such as instance 1 with three log groups, then the log group for instance 2 should be numbered from 4 and cannot be used with three numbers of the three-tier.
In a RAC environment, the online logs for all instances must be placed on the shared storage, because if a node shuts down unexpectedly and the remaining nodes are crash Recovery, the node that executes crash Recovery must be able to access the connection log of the failed node. This requirement can only be met by placing the online logs on the shared storage.
2.3.3 Archived Log
Each instance in the RAC produces its own archive log, which is only used when the media recovery is executed, so the archive log does not have to be placed on the shared storage, and each instance can store the archive log locally. However, if you are backing up archive logs on a single instance or media recovery operations, and require that the node have access to archive logs for all instances, there are a number of options for configuring the archive log in a RAC environment.
1) using NFS
This approach is actually archived to shared storage, such as 2 nodes, each node has 2 directories, and ARCH1,ARCH2 for instance 1 and instance 2 respectively. Each instance configures only one archive location, archives it locally, and then hangs the other directory locally via NFS.
2) Inter-instance archiving (Cia:cross Instance Archive)
This approach is a variant of the previous approach and is a common configuration method. Two nodes create 2 directories Arch1 and ARCH2 correspond to the archive logs generated by instance 1 and instance 2, respectively. Each instance is configured with a two archive location. Location 1 corresponds to the local archive directory, and location 2 corresponds to another instance.
3) using ASM
This approach is also archived to shared storage, but only through the ASM provided by Oracle, the above complexity is hidden, but the principle is the same.
2.3.4 Undo tablespace
As with redo Log, in a RAC environment, each instance needs to have a separate rollback table space, which is configured by the parameter sid.undo_tablespace.
2.4 SCN (System change number)
The SCN is a mechanism that Oracle uses to track the order in which changes occur within the database, and it can be imagined as a high-precision clock, with an SCN number for each redo log entry and the Undo Data block,data Block. Oracle's Consistent-read, Current-read, and Multiversion-block are all dependent on the SCN implementation.
In RAC, there is GCS responsible for the global maintenance of the SCN, the default is the Lamport SCN generation algorithm, the approximate principle is: in all communication between the nodes of the content of the SCN, each node to receive the SCN and the local SCN comparison, if the local SCN small, Then adjust the SCN and receive the same, if there is not much communication between the nodes, but also actively communicate with each other on a regular basis. So even if the node is idle, there will be some redo log generated. There is also a broadcast algorithm (broadcast), the algorithm is after each commit operation, the node to the other nodes to broadcast the SCN, although this way will cause a certain load on the system, but ensure that each node immediately after the commit to see the SCN.
Each of the two algorithms has advantages and disadvantages, although the lamport load is small, but there is a delay between the nodes, the broadcast although the load, but there is no delay. The Oracle 10g RAC defaults to the broadcast algorithm, which you can see from the Alert.log log:
Picked broadcast on commit scheme to generate SCNS
Redolog Checkpoint and SCN relationships
Http://blog.csdn.net/tianlesoftware/archive/2010/01/25/5251916.aspx
2.5 Cache Fusion, GCS, GES relationship
Cache fusion (Memory fusion) is a data block transfer between instances through a high-speed private interconnect, which is the core work mechanism of RAC, which virtual the SGA of all instances into a large SGA area. Each time a different instance requests the same block of data, the block is passed through the private interconnect between the instances.
The entire cache Fusion consists of two services: GCS and GES. GCS is responsible for the transfer of database between instances, GES is responsible for lock management
Summarize:
LMS and LMD are jointly responsible for GCS
GCs and Ges together make up GRD
And this set of services is DLM.
Oracle RAC Concurrency and architecture