Introduction to Oracle RAC Clusters

Source: Internet
Author: User

For RAC, the most important thing is to understand the internal principles and architecture. Installation is not very difficult. Both troubleshooting and maintenance are inseparable from the architecture and Internal principles.

Cluster category

1. High-performance computing

Computing tasks are assigned to different computer nodes to improve the overall computing capability. They are mainly used in the field of scientific computing. It mainly uses parallel computing.

2. Load Balancing cluster (LB)

Distribute the service load traffic evenly and reasonably as much as possible to each node in the cluster. Each node can handle part of the load and dynamically balance the load. The server Load balancer algorithm is not a simple average, but an optimized distribution based on the available resources of each node or the special circumstances of the network. Therefore, distribution + rationality is the core of Server Load balancer.

3. High Availability (HA)

It focuses on improving system availability and integrating Fault Tolerance of hardware and software to achieve high availability of the overall service. If a node fails, another node replaces it.

RAC is a true combination of LB and HA. In a sense, only the final application (database) can achieve the true LB, and most clusters are HA.

Special problems in Cluster Environment

1. Concurrency Control

Shared storage exists in the cluster environment. Each node in the cluster has the same access permissions to the shared storage. Therefore, a mechanism is required to control the node's access to data.

In RAC, The DLM (Distribute Lock Management) mechanism is used to control concurrency between instances.

2. Amnesia (Amnesia)

If the configuration files in the cluster environment are not stored in a centralized manner, each node has a local copy. When the cluster runs normally, you can modify the cluster configuration on any node, all these changes are automatically synchronized to other nodes.

If node 1 needs to be shut down due to normal maintenance, node 2 modifies the configuration and closes Node 2. start Node 1. Because the configuration modification made by node 2 is not synchronized to node 1, after node 1 is started, it still uses the old configuration file, causing configuration loss.

3. split brain)

In a cluster, nodes need a mechanism (Heartbeat) to understand the health status of each other to ensure that each node coordinates the work. Assume that only the heartbeat is faulty and each node is still working normally. Each node considers other nodes to be down and is the only healthy person in the cluster, therefore, you must obtain the "control" of the entire cluster ". Storage is shared, which means disaster. This situation is split-brain ".

The voting algorithm can solve this problem.

Each node records its votes in the voting area (the heartbeat of each node is one vote), and each node reads its votes with the node.

If a cluster is divided into two partitions, one is three nodes, and the other is two nodes. The number of votes on all nodes in the three node partitions is 3. The number of votes on all nodes in the two node partitions is 2. the partitions with two nodes will be kicked out and automatically restarted. If the two partition nodes are the same, the partiton In the first voting zone will survive, and the other partition will be kicked out and restarted, in this case, the partition of the master node (usually the first startup node) will survive.

4. IO Fencing and ),

If a cluster fails, you must determine which node should gain control of the cluster and which nodes must be kicked out. In this case, the voting problem needs to be solved.

It is not enough to just kick them out, because they may still be running (just leave the cluster) and need to ensure that they no longer access Shared data. This is the problem to be solved by IO isolation.

IO Fencing is implemented by hardware and software. Different Cluster manufacturers use different methods, and some require hardware support (mainly whether storage devices support certain protocols ). Oracle RAC uses software to directly restart faulty nodes.

No matter which method is used, IO Fencing aims to ensure that the faulty node cannot continue to access Shared data.

Some storage devices support the SCSI Reserve/Release command. Normal nodes use the SCSI Reserve Command to lock the storage device. If the faulty node finds that the storage is locked, it knows that it has been kicked out of the cluster and restarted on its own, this mechanism is called suicide ). For example, clusters in Sun and Veritas use this mechanism.

Regardless of the software or hardware, the general principle is: a normal node notifies the faulty node in some way, and the faulty node restarts. The notification methods include hardware and software, making the hardware more secure.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.