Cache fusion Technology and main background process (iv)

Source: Internet
Author: User

Cache Fusion principle

The background process for RAC has been described earlier, in order to gain a deeper understanding of how these background processes work, first understand how the multi-node management of shared data file access in RAC works. To understand the center of how RAC works, you need to know the important concept of cache fusion, and to play the role of cache fusion, there is a precondition that the internet will be faster than accessing the disk. Otherwise, the meaning of the Cache Fusion is not introduced. In fact, the 100MB Internet is now very common.

What is Cache Fusion?

The cache Fusion is a block transfer between the SGA of each node in the cluster via the internetwork (the high-speed Private interconnect), which is the most central working mechanism of RAC, and he virtual the SGA of all instances into a large SGA area, Each time a different instance requests the same block of data, the block is passed through the private interconnect between the instances. Such an inefficient implementation (OPS implementation) to avoid pushing blocks to disk first and then re-reading into the cache of other instances. When a block is read into the cache of an instance in the RAC environment, the block is given a lock resource (different from the row-level lock) to ensure that the other instance knows that the block is being used. Then, if another instance requests a copy of the block that is already in the cache of the previous instance, the block is passed directly through the internetwork to the SGA of the other instance. If the in-memory block has been changed, but the change has not been committed, then a CR copy will be passed. This means that, whenever possible, data blocks can move between the caches of each instance without having to write back to the disk, thereby avoiding the extra I/O that is spent synchronizing multiple instances of caching. It is clear that different instances of the cached data can be different, that is, before an instance accesses a particular block, and it has never accessed the block, it either fusion from the other instance cache, or read it from disk. GES (Global Cache Service, global memory services) and the Enquenceservice process management use the data Block synchronization interconnect between cluster nodes.

Here are some questions to consider:

    1. When the block is not read by all instances, and when the first instance reads, how is the lock added and what locks are added? If there is another instance to read this block at this point, almost at the same time, then how can Oracle arbitrate, how to get one read, and the other in the cache of the previous person to get it?
    2. If a block has been read in by another instance, how does this instance determine its existence?
    3. If an instance changes the data block, will the change be passed to another instance, or will the other instance know and update the status again?
    4. If an instance is going to swapout a block, and the other instances also have the cache of this block, the modified and unmodified, and the modified and other instances of this instance, how do I do it? Truncate a table, drop a table ... What is the difference from a single instance?
    5. How should the application be designed so that the RAC really works, rather than introducing competition, causing the system to be weakened?
    6. Implementation of a RAC down lock.

A lock is a resource that is retained in the SGA of each instance and is typically used to control access to a database block. Each instance typically retains or controls a certain number of locks associated with the block range. When an instance requests a block, the block must obtain a lock, and the lock must come from an instance that currently controls those locks. That is, locks are distributed on different instances. And to get a specific lock to be obtained from a different instance. However, from this point of view, these locks are not fixed on an instance, but are adjusted to the most frequently used instances based on the request frequency of the lock, thus improving efficiency. It is very resource-intensive to realize the allocation and redistribution and control of these resources. This also determines that the application design requirements for RAC are relatively high. If an instance crashes or an instance joins, there is a relatively long redistribution resource and process. The resources are redistributed for more efficient use, and are redistributed when the instance is launched or joined, in the case of normal operation. You can see this information in the alert file. The allocation control of Cache Fusion and other resources requires a fast internetwork, so pay attention to the metrics associated with messages on the internetwork to test the traffic and time of the internetwork. For some of the previous questions, you can learn with a different concept, which is the global cache service and the global queue service.

Global cache Service (GCS): To be understood together with the cache Fusion. The global cache involves data blocks. The global cache service is responsible for maintaining cache consistency within the global buffer store, ensuring that an instance can obtain a global lock resource at any time when it wants to modify a block of data, thereby avoiding the possibility of another instance modifying the block at the same time. The instance that is being modified will have the current version of the block (both committed and uncommitted) as well as the Block's front image (post image). If the block is also requested by another instance, the global cache service is responsible for tracking the instance that owns the block, what version of the owning block, and what mode the block is in. The LMS process is a key component of the global cache service.

conjecture: Oracle's current cache fusion, when accessed by other instances, will transfer blocks over and build a block in the SGA of that instance, the main reason for this could be that access between interconnect or from local memory is faster, allowing Oracle It can be obtained quickly from local memory when accessed again. But this also has a problem, because there will be multiple copy of the data block in multiple nodes, so the management of the consumption is considerable, Oracle will have a better solution to appear in the subsequent version? If interconnect speed permits ... )

Global Queue Service (GES): primarily responsible for maintaining consistency within the dictionary cache and in the library cache. The dictionary cache is a cache of data dictionary information stored in an instance's SGA for high-speed access. Because the dictionary information is stored in memory, modifications to the dictionary (such as DDL) on a node must be propagated immediately to the dictionary cache on all nodes. GES is responsible for dealing with the above situation and eliminating the differences between instances. For the same reason, in order to parse the SQL statements that affect these objects, the library cache locks on the objects in the database are removed. These locks must be maintained between instances, and the global queue service must ensure that there are no deadlocks between multiple instances requesting access to the same object. The Lmon, LCK, and LMD processes work together to implement the functionality of the Global queue service. GES is an important service that adjusts other resources among nodes in a RAC environment, in addition to the maintenance and management of the data block itself (completed by GCS).

Sql> SELECT * from Gv$sysstat where name like ' GCs% '

Here you can see the number of GCS and GES messages sent. (If you are not using DBCA to create a database, SYSDBA permissions to run Catclust.sql scripts to create RAC-related views and tables)

What is high availability

Oracle failsafe, Data Guard, and RAC are all Oracle's high-reliability (HA) solutions. But there is a big difference between the three. HA is the first letter combination of high availability, translated, can be called highly available, or high availability, high availability (environment). I think it should be said that HA is an idea rather than a specific or a series of technologies, just like a grid. Once the system has been worked out, the performance of the evaluation system is highly available. This is the OS-level dual-machine hot standby. RAC is the abbreviation for real application cluster, which is the technology for running a database on multiple hosts, which is a DB multiple instance. It has the advantage of being able to build a well-performing cluster of multiple poorly performing machines and load balancing, so that when one node fails, the service on it automatically goes to another node to execute, and the user doesn't even feel anything.

The difference between FAILSAFE and RAC

1. Operating system:

The failsafe system is limited to the WINDOWS platform and must be mated to MSCS (Microsoft Cluster Server), and RAC was originally launched on UNIX platforms and has now been extended to LINUX and WINDOWS platforms via the OSD (opera Ting system dependent) interacts with the systems. For high-end RAC applications, UNIX remains the preferred platform.

2. System structure:

FAILSAFE uses the SHARE nothing structure, that is, a number of servers to form a cluster, a common connection to a shared disk system, at the same time, only one server can access the shared disk, to provide services externally. As long as this server fails, there is another one to take over the shared disk. RAC is a SHARE everything, each server that makes up the cluster has access to the shared disk and can provide services externally. That is, FAILSAFE can only take advantage of a single server resource, and RAC may utilize multiple server resources in parallel.

3, operating mechanism:

Each server that makes up the FAILSAFE cluster has a separate IP, an entire cluster has an IP, and a separate IP for the FAILSAFE GROUP (the latter two IP is the virtual IP, and for the customer it is possible to transparently access the database by simply knowing the cluster IP). During work, only one server (preferred or owner or manager) is available, the rest of the servers (operator) are on standby, and the other server takes over the former when the current fails, including Failsafe GROUP IP and cluster IP, and Failsafe will start the database Service,listener and other services above. The customer simply needs to reconnect, without any changes. For a cluster of RAC, each server has its own ip,instance, and so on, and can provide services separately, except that they are all operating on the same database on the shared disk. When a server fails, the user simply modifies the network configuration, such as (TNSNames. ORA), you can reconnect to a server that is still functioning. But when used in conjunction with TAF, even the network can be configured to be transparent.

4. Cluster Capacity:

The former is usually two units, the latter on some platforms can be expanded to 8 units.

5. Partition:

The disk on which the FAILSAFE database resides must be in NTFS format, the RAC is relatively flexible and typically requires RAW, but several OSes have already manipulated the CLUSTER file system to be used directly by the RAC. To sum up, FAILSAFE is more suitable for a system with high reliability requirements, relatively small applications, and high performance requirements, while RAC is more suitable for larger applications where reliability, scalability, and performance requirements are relatively high.

RAC and OPS differences

RAC is a successor to OPS, inheriting the concept of OPS, but the RAC is brand new and the CACHE mechanism and OPS are completely different. RAC solves the conflict problem caused by 2 nodes in OPS that write the same BLOCK simultaneously. RAC and OPS are completely different products from the product, but we can think of different versions of the same product

Differences between hot standby, RAC, and Data guard for dual machines

Data Guard is a remote replication technology for Oracle, it has physical and logical points, but in general, it needs a separate system in a remote location, two sets of hardware configurations can be different systems, but the software structure of the two systems are consistent, including the software version, directory storage structure, and data synchronization (in fact, is not real-time synchronization), the two sets of systems as long as the network is a pass, is a disaster-tolerant solution. For RAC, it is a locally high-availability cluster, where each node is used to share unused or identical applications to solve problems such as inefficient operation and single-node failure, which is made up of several identical or different servers, plus a SAN (shared storage area). The Oracle high Availability product comparison is shown in the following table:

Communication between nodes (interconnect)

Usually in a RAC environment, on the basis of a public network, two dedicated networks need to be configured for inter-node interconnection, in the definition of hacmp/es resources, these two dedicated networks should be defined as "private". During instance startup, RAC automatically recognizes and uses these two dedicated networks, and if there is a public network, RAC recognizes a common network. When a RAC recognizes multiple networks, the RAC uses the TNFF (Transparent network failvoer failback) feature, and all communication between nodes under TNFF is done through the first dedicated network, and the second (or third, etc.) as the first dedicated net Backup after the network failure. The communication between RAC nodes is as shown.

Cluster_interconnects is an optional initialization (Init.ora) parameter in Oracle RAC. This parameter specifies which network is used for inter-node interconnect communication, and if multiple networks are specified, RAC automatically load-balances on those networks. However, when the cluster_interconnects is set, TNFF does not work, which reduces the availability of the RAC, and the failure of the internetwork between any one node will invalidate one or more nodes of the RAC. The choice of the physical connection of the internal NIC used by ORACLE RAC for interconnect: A switch connection or a network cable directly connected. The disadvantage of the direct connection is that once a node machine in the internal network card failure, Oracle from the OS to get two node network card status is not normal, resulting in two instances are down. In the event of a problem with the Interconnect line, Oracle typically initiates a competitive mechanism to determine which instance is down, and if an instance of the outage is just a good instance, it will cause two instances to go down. In 9i, Oracle will wait for a period of time before starting the competition mechanism, waiting for the OS to send the status of the network to Oracle, and if the NIC is down for which instance the Oracle obtains before the time-out, then the instance will be left out so that the normal instance can continue to service. Otherwise it will enter the competition mechanism.

In summary, there are two types of communication between nodes:

? is connected to the switch above, in general, will ensure that the normal instances continue to serve, but sometimes if the OS too late to send the NIC status to Oracle, it is also possible to cause two nodes are down.

? If it is a direct connection, it will cause two instances to be down.

CSS Heartbeat

OCSSD This process is clusterware the most critical process, if the process is abnormal, will cause the system to restart, this process provides the CSS (Cluster Synchronization Service) services. The CSS service monitors the status of the cluster in real time through a variety of heartbeat mechanisms, providing basic cluster service functions such as brain crack protection.

The CSS service has 2 heartbeat mechanisms: One is through the network Heartbeat of the private networks and the other through the voting Disk diskheartbeat. These 2 heartbeats have the maximum delay, for Disk Heartbeat, this delay is known as IOT (I/O Timeout), and for Network Heartbeat, this delay is called MC (Misscount). These 2 parameters are all in seconds, and by default the IOT is larger than MC, which is automatically determined by Oracle, and is not recommended. You can view the parameter values by using the following command:

$crsctl Get CSS Disktimeout

$crsctl Get CSS Misscount

The communication protocols used between Oracle RAC nodes are shown in the following table.


Lock (Lock) is used to control the concurrency of the data structure, if there are two processes to modify the same data at the same time, in order to prevent confusion and accidents, with locks to control the order of Access data. A lock can be accessed first, and another process waits until the first one has released the lock in order to have the lock and continue to access it. In general, there are two types of locks in the RAC, one is the lock between the processes of the local host, and the other is the lock between the processes of the different hosts. There are two types of local locking mechanisms, called lock (Lock), and the other called latch latch.

The global lock refers to RAC lock, which is the lock between different hosts, and Oracle uses the DLM (distributed lock Management, distributed lock Management) mechanism. In Oracle RAC, the data is shared globally, which means that each process sees the same block of data, and the data blocks can be passed between different machines. The GRD directory structure is given.

You can see that Mode, Role, and N form the basic structure of RAC lock.

    1. Mode has N, S, X3 ways
    2. Role has Local and Global two kinds
    3. N has PI and XI two kinds, general 0 means xi,1 represents pi
    4. Global Memory Management
    5. Database files in a RAC
    6. Consistency of read in RAC
    7. Cluster readiness Service (CRS)
    8. Global Resource Directory

Consistency management

Data consistency and concurrency describe how Oracle maintains data consistency issues in a multiuser database environment. In a single-user database, users modify the data in the database without worrying about other users modifying the same data at the same time. However, in multi-user databases, statements in multiple transactions that are executed concurrently can modify the same data. Concurrently executed transactions need to produce meaningful and consistent results. Thus, in a multiuser database, the control of data concurrency and data consistency is important: data concurrency: Each user can see consistent results for the data. The Ansi/ios SQL standard (SQL 92) defines 4 transaction isolation levels and has a different impact on transaction performance. These isolation levels are presented in consideration of 3 phenomena that must be avoided for transactional concurrency execution. The 3 phenomena that should be avoided are:??

    1. Dirty reads: A transaction can read data written by other transactions but not yet committed.
    2. Non-repeatable read (fuzzy Read): A transaction repeats the read to the previously read and queried data, which is the data that has been modified or deleted by other committed transactions.
    3. Phantom read: A transaction repeatedly runs a number of column rows returned by the query, including additional rows that have already been inserted by other already committed transactions.

SQL92 defines 4 isolation levels based on these objects, and the transaction runs at a specific isolation level allowing for special performance. The following table shows the read behavior that is blocked by the isolation level.

OCR structure

(a) The OCR KEY is a tree-shaped structure.

(b) OCR PROCESS each node has the OCR CACHE replication, the ORC MASTER node is responsible for updating to the OCR DISK

Oracle clusterware Background process

The auto-start script is defined in/etc/inittab:

OCSSD (Clustery synchronization Service) provides heartbeat mechanism to monitor cluster status


NETWORK hearbeat

CRSD (Clustery Ready Service) provides high availability, intervention, shutdown, restart, and transfer services.

Resources include Nodeapps, database-related: The former each node only need one to work properly, the latter is related to the database, not restricted by the node, can be multiple.

EVMD: This process is responsible for issuing CRS generated events, or CRS and CSS two services communication between the bridge

Racgimon: This process is responsible for checking the database health state, including the start, stop, and failover of the database service. is a persistent connection and periodically checks the SGA.

OPROCD (Process Monitor Daemon) detects CPU hang (non-Linux platform use)

concurrency control for RAC

DLM distributed lock Management.

    1. Non-cache Fusion resources: Includes data files, control files, data dictionary views, Library Cache, Row cache
    2. Cache Fusion resources: includes common data blocks, index blocks, Chito, UNDO blocks.
    3. GRD (Global Resource Directory): Records the distribution of each chunk between clusters, with master node and Shadownode in the SGA
    4. PCM lock:mode role Past Image
    5. LMS0 (LOCK MANAGER Service): The corresponding service is GCS (Global Cache Service), which is primarily responsible for passing the Cache fusion parameters between instances Gcs_server_processes
    6. LMD: The corresponding service is GES (Global ENQUEUE Service), which is responsible for the management of locks during delivery.
    7. LCK: Responsible for Non-cache FUSION resource synchronization access, each instance has a process.
    8. Lmon: This process communicates with each instance on a regular basis and the corresponding service is CGS (Cluster Group service). Node Monitor is provided via GRD with bitmap 0, a flag. 0: Node Shutdown 1: node is functioning regularly through CM layer communication.
    9. Two heartbeat mechanisms: Network heartbeat and Control file disk heartbeat 3S at a time.
    10. DIAG: Monitor status, write log Alert.log
    11. GSD: Provides a management interface for users.

Main background process for RAC

RAC Refactoring Trigger condition


(ii) Refactoring cluster Trigger: node joins or leaves the cluster, the Network Heartbeat exception is triggered by NM: because Lmon or GCS, GES communication anomalies, by IMR (Instance membership reconfiguration ) Controlfile heartbeat Trigger.

RAC advantages and Disadvantages RAC Benefits

(i) Multi-node load balancing

(ii) Providing high availability, fault tolerance and seamless switching, minimizing the impact of hardware and software anomalies.

(iii) Time to provide transactional response through parallel execution techniques-typically used in data analysis systems.

(iv) Increase the number of transactions per second and the number of connections by scale-out-typically for OLTP.

(v) To save hardware costs, you can use a number of inexpensive PC servers instead of minicomputer mainframe, saving the corresponding maintenance costs.

(vi) Scalability is good, you can easily add delete nodes, expand Hardware resources. RAC disadvantage

(i) more complex management and higher requirements

(ii) Poor system planning design performance may be less than a single node

(iii) Software costs may be increased (per CPU charge)

Reference documents

    1. Oracle's three highly available cluster scenarios
    2. Introduction to cluster Concept: The Oracle Advanced Course--theoretical textbook
    3. Oracle one-off RAC Survival Guide
    4. Oracle 11gR2 RAC Management and performance optimization

Cache fusion Technology and main background process (iv)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.