DRM problems and shutdown in Oracle10gRAC

Last Update:2018-07-03 Source: Internet

Author: User

Tags metalink

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the RAC environment, Oracle uses GlobalResourceService to record the resource information of each RAC node.

In the RAC environment, Oracle uses GRD (Global Resource Service) to record Resource information of each RAC node.

In the RAC environment, Oracle uses GRD (Global Resource Service) to record the Resource information of each RAC node. The Resource information of each RAC node is recorded by using GCS (Global Cache Service) and GES (Global Enqueue Service) these two services are managed.

In RAC, each node has its own SGA and buffer cache. To ensure Cache resource consistency and improve performance, GCS and GES specify an instance in RAC to manage the Cache, this node is the Resource Master.

Before 10 Gb, Cache resources cannot be moved between nodes, unless it is restarted or a node is evicted by RAC due to other exceptions. The 10 Gb DRM solves this problem. It can ensure that the cache can be remaster to the node that frequently accesses this part of data, thus improving the RAC performance. The full name of DRM is Dynamic Resource Mastering. The Doc ID on metalink: 390483.1 describes the DRM information in detail.

Theoretically, when a non-master node has frequent access to required resources, it can be upgraded to a master node to reduce the need for a large number of subsequent cross-node resource access.

However, first of all, a good RAC application design should do its utmost to avoid multi-node access to the same resource. If there is no multi-node access to the same resource, the problem to be solved by DRM does not exist at all. Second, DRM itself consumes resources and has many bugs. For a poorly designed system, frequent DRM also triggers Libary cache lock, leading to instance suspension.

More seriously, on the 10.2.0.3 system, we once met a case, a giant database in the telecom industry, and node 2 of rac, because batch processing jobs are not in the business period, first, a 40 Gb table is cached. After the business period expires, the OLTP service of Node 1 of rac needs to frequently access the table. At this time, a fault occurs. Due to the intervention of DRM, node 2 starts to transmit the 40Gcache data in the memory to node 1, and the Gigabit bandwidth of the heartbeat network segment is exhausted. RAC is in a zombie state and lasts for 40 minutes.

Check the network traffic diagram afterwards. During this period, the private network traffic continues to reach the peak level of 90 Mb/s.

According to metalink, the problem is indeed caused by the DRM mechanism. The final solution uses implicit parameters to shield the DRM feature:

_ Gc_affinity_time = 0

_ Gc_undo_affinity = FALSE

Therefore, the emergence of drm is only a theoretical relief, and it cannot play a role in practical large-scale applications. Just like Oracle's own Automatic Load Balancing for RAC, it's just a pretty thing. If someone is using it, you can only wait. Maybe it doesn't matter if the pressure on a database is very small, but I have never met it. If I say it again, the pressure is very small. Why should I go to RAC.

Technology is not our ultimate goal. Technology is people-oriented and technology is the same. People are the most important deciding factor.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More