Understanding of DRM in Oracle 10g RAC

Source: Internet
Author: User

some summary of DRM

1. What is DRM
DRM (Dynamic Resource Management) is a new feature of Oracle 10g, where Oracle uses GRD (Global Resource Service) to record resource information for individual nodes in an Oracle RAC environment. This is specifically managed through both GCS (Global Cache Service) and GES (Global Enqueue Service). Because each node in the RAC has its own SGA and buffer cache, in order to ensure the consistency and high performance of all node cache resources. GCS and Ges Specify an instance of one of the nodes in the RAC to manage the cache, which is resource Master. When rematering or changing the primary node only occurs in the reconfiguration, it will automatically start on two normal operating instances or the instance is closed, and the exception node is kicked out of the cluster. So when node A is the primary node, which is Resource master, the resource is in node A until it is reconfigured.

In theory, the use of DRM, non-master node to the required resources have frequent access requirements, can be promoted to the master node, thereby reducing the number of subsequent cross-node resource access requirements, such as: Cache resource is frequently accessed by Node B, the resource can be from Node A remaster to Node B.

However, as a good RAC application design, access to the same resource from multiple nodes is a problem that should be avoided, and if the same resource is accessed on only one node, then for DRM, it does not exist at all. Second, the DRM process itself consumes resources.

/* Below is an example of the old Bear website: http://www.laoxiong.net/problem-caused-by-drm.html */

In a set of RAC systems, there are intermittent performance problems, but they automatically return to normal after a certain period of time.

From the top 5 in the awr waiting to see:

<span style= "FONT-SIZE:12PX;" >top 5 Timed Events                                         Avg%total  ~~~~~~~~~~~~~~~~~~                                        wait   call  Event                                 Waits time    (s)   (MS)   Time Wait Class  ---------------------------------------------------------------------------  Latch:cache Buffers LRU chain      774,812     140,185    181   29.7      other  gc buffer busy                    1,356,786      61,708   13.1    Cluster  latch:object queue Header ope      903,456      55,089   11.7      Other  Latch:cache buffers chains         360,522      49,016    136   10.4 Concurrenc  GC Current Grant busy               112,970      19,893    176    4.2    Cluster            -------------------------------- -----------------------------  </span>

you can see that 3 of the TOP 5 are latch related waits, while the other 2 are related to RAC waits.
If you look at finer waiting data, you can find other problems:
<span style= "FONT-SIZE:12PX;" > AVG%ti Me total Wait Wait Waits Event waits-outs time (s) (ms)/TXN------- -------------------------------------------------------------------latch:cache buffers lru cha 774,812 n/ A 140,185 181 1.9 GC Buffer busy 1,356,786 6 61,708 3.3 latch:o       Bject Queue header o 903,456 N/a 55,089 2.2 Latch:cache buffers chains 360,522 N/A 49,016 136 0.9 GC Current grant busy 112,970-19,893 176 0.3 GCs DRM F      Reeze in Enter serv 38,442 18,537 482 0.1 gc CR block 2-way 1,626,280 0 15,742 3.9 gc remaster 6,741, 12,397 1839 0.0 roW Cache lock 52,143 6 9,834 189 0.1 </span> 

from the above data can also be seen, in addition to the top 5 waits, there is "GCs DRM freeze in Enter server Mode" and "GC remaster" These 2 relatively rare wait events, from their name, are obviously related to DRM. So what is the correlation between the 2 wait events and the top 5 event? MOS document "Bug 6960699–" Latch:cache buffers Chains "Contention/ora-481/kjfcdrmrfg:sync Timeout/oeri[kjbldrmrpst:!master" [ID 6960699.8] "mentioned, DRM may indeed cause a lot of" latch:cache buffers Chains "," Latch:object Queue header operation "Wait, although the document does not mention, but does not exclude will cause "Latch:cache buffers LRU chain" such waiting.
to further verify that the performance issue is related to DRM, use the tail-f command to monitor the trace files of the LMD background process. When you start DRM in the trace file, query the V$session view and discover a large number of "Latch:cache buffers Chains", "Latch:object Queue header operation" Wait events, and " GCS DRM freeze in Enter server Mode "and" GC remaster "wait for events while the system load increases and the foreground reflects performance degradation. After the DRM is complete, these waits disappear and the system performance returns to normal. The
seems to be able to avoid this problem by just turning off DRM. How do I close/disable DRM? Many MOS documents refer to a method that sets 2 implied parameters:

<span style= "FONT-SIZE:12PX;" >_gc_affinity_time=0  _gc_undo_affinity=false  </span>

Unfortunately, these 2 parameters are static parameters, which means that the instance must be restarted to take effect.
You can actually set 2 additional dynamic implicit parameters to achieve this. After setting these 2 parameters by the following values, DRM cannot be completely banned/closed, but DRM is turned off from "de facto".
<span style= "FONT-SIZE:12PX;" >_gc_affinity_limit=250  _gc_affinity_minimum=10485760  </span>

You can even set the above 2 parameter values to a larger size. These 2 parameters are effective immediately, after setting these 2 parameters on all nodes, the system is no longer DRM, often for a period of time observation, the performance problems described in this article no longer appear.
The following is the wait event data after the DRM is turned off:

<span style= "FONT-SIZE:12PX;"                                        >top 5 Timed Events AVG%total ~~~~~~~~~~~~~~~~~~ Wait Call Event Waits time (s) (ms) time wait Class----------------          -----------------------------------------------------------CPU Time 15,684                   67.5 db file sequential read 1,138,905 5,212 5 22.4 User I/o GC CR block 2-way    780,224 285 0 1.2 Cluster log file sync 246,580 246 1 1.1 Commit sql*net more data from client 296,657 236 1 1.0 Network-------------                                                                                ------------------------------------------------                 AVG%time Total Wait Wait Waits Event                Waits-outs time (s) (ms)/txn-----------------------------------------------------------                   ---------------DB file sequential read 1,138,905 N/a 5,212 5 3.8 GC CR block 2-way       780,224 N/A 285 0 2.6 log file Sync 246,580 0 246 1 0.8 sql*net More data from Clien 296,657 N/a 236 1 1.0 sql*net Message from Dbl       Ink 98,833 N/a 218 2 0.3 GC current block 2-way 593,133 N/a 218               0 2.0 GC CR Grant 2-way 530,507 N/a 154 0 1.8 db file scattered read      54,446 N/A 151 3 0.2 kst:async disk IO 6,502 N/a 107 0.0 GC CR multi block request 601,927 N/A 0 2.0 sql*net more data to Clien T 1,336,225 N/A         0 4.5 log file parallel write 306,331 N/A 0 1.0 GC current          Block busy 6,298 N/a 0.0 backup:sbtwrite2 4,076 N/A 0.0 GC Buffer Busy 17,677 1 3 0.1 gc Current          Grant busy 75,075 N/a 1 0.3 direct path read 49,246 N/A 1 0.2 </span>

understands that: DRM (Dynamic Resource Management) theoretically implements the promotion of a non-master node to the master node, Can reduce cross-node resource access, but it brings more problems. If there are two nodes in a RAC cluster, Node 2 caches a large and large table during the idle period, and when the business is busy, node 1 needs access to the table, and if there is no DRM, it is accessed from the store, but if there is DRM, the cache resource is found in Node 2. This resource is passed to Node 1 from the Cache in Node 2, which consumes a lot of bandwidth and consumes many resources.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.