DRM problems and shutdown in Oracle 10g RAC

Source: Internet
Author: User
Tags metalink

In the RAC environment, Oracle uses GRD (Global Resource Service) to record the Resource information of each RAC node. The Resource information of each RAC node is recorded by using GCS (Global Cache Service) and GES (Global Enqueue Service) these two services are managed.

In RAC, each node has its own SGA and buffer cache. To ensure Cache resource consistency and improve performance, GCS and GES specify an instance in RAC to manage the Cache, this node is the Resource Master.

Before 10 Gb, Cache resources cannot be moved between nodes, unless it is restarted or a node is evicted by RAC due to other exceptions. The 10 Gb DRM solves this problem. It can ensure that the cache can be remaster to the node that frequently accesses this part of data, thus improving the RAC performance. The full name of DRM is Dynamic Resource Mastering. The Doc ID on metalink: 390483.1 describes the DRM information in detail.

Theoretically, when a non-master node requires frequent access to required resources, www.bkjia.com can be upgraded to a master node to reduce the need for a large number of subsequent cross-node resource access.

However, first of all, a good RAC application design should do its utmost to avoid multi-node access to the same resource. If there is no multi-node access to the same resource, the problem to be solved by DRM does not exist at all. Second, DRM itself consumes resources and has many bugs. For a poorly designed system, frequent DRM also triggers Libary cache lock, leading to instance suspension.

More seriously, on the 10.2.0.3 system, we once met a case, a giant database in the telecom industry, and node 2 of rac, because batch processing jobs are not in the business period, first, a 40 Gb table is cached. After the business period expires, the OLTP service of Node 1 of rac needs to frequently access the table. At this time, a fault occurs. Due to the intervention of DRM, node 2 starts to transmit the 40Gcache data in the memory to node 1, and the Gigabit bandwidth of the heartbeat network segment is exhausted. RAC is in a zombie state and lasts for 40 minutes.

Check the network traffic diagram afterwards. During this period, the private network traffic continues to reach the peak level of 90 Mb/s.

According to metalink, the problem is indeed caused by the DRM mechanism. The final solution uses implicit parameters to shield the DRM feature:

_ Gc_affinity_time = 0

_ Gc_undo_affinity = FALSE

Therefore, the emergence of drm is only a theoretical relief, and it cannot play a role in practical large-scale applications. Just like Oracle's own Automatic Load Balancing for RAC, it's just a pretty thing. If someone is using it, you can only wait. Maybe it doesn't matter if the pressure on a database is very small, but I have never met it. If I say it again, the pressure is very small. Why should I go to RAC.

Technology is not our ultimate goal. Technology is people-oriented and technology is the same. People are the most important deciding factor.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.