Understanding of DRM in Oracle 10g RAC

Last Update:2014-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

some summary of DRM

1. What is DRM
DRM (Dynamic Resource Management) is a new feature of Oracle 10g, where Oracle uses GRD (Global Resource Service) to record resource information for individual nodes in an Oracle RAC environment. This is specifically managed through both GCS (Global Cache Service) and GES (Global Enqueue Service). Because each node in the RAC has its own SGA and buffer cache, in order to ensure the consistency and high performance of all node cache resources. GCS and Ges Specify an instance of one of the nodes in the RAC to manage the cache, which is resource Master. When rematering or changing the primary node only occurs in the reconfiguration, it will automatically start on two normal operating instances or the instance is closed, and the exception node is kicked out of the cluster. So when node A is the primary node, which is Resource master, the resource is in node A until it is reconfigured.

In theory, the use of DRM, non-master node to the required resources have frequent access requirements, can be promoted to the master node, thereby reducing the number of subsequent cross-node resource access requirements, such as: Cache resource is frequently accessed by Node B, the resource can be from Node A remaster to Node B.

However, as a good RAC application design, access to the same resource from multiple nodes is a problem that should be avoided, and if the same resource is accessed on only one node, then for DRM, it does not exist at all. Second, the DRM process itself consumes resources.

/* Below is an example of the old Bear website: http://www.laoxiong.net/problem-caused-by-drm.html */

In a set of RAC systems, there are intermittent performance problems, but they automatically return to normal after a certain period of time.

From the top 5 in the awr waiting to see:

<span style= "FONT-SIZE:12PX;" >top 5 Timed Events                                         Avg%total  ~~~~~~~~~~~~~~~~~~                                        wait   call  Event                                 Waits time    (s)   (MS)   Time Wait Class  ---------------------------------------------------------------------------  Latch:cache Buffers LRU chain      774,812     140,185    181   29.7      other  gc buffer busy                    1,356,786      61,708   13.1    Cluster  latch:object queue Header ope      903,456      55,089   11.7      Other  Latch:cache buffers chains         360,522      49,016    136   10.4 Concurrenc  GC Current Grant busy               112,970      19,893    176    4.2    Cluster            -------------------------------- -----------------------------  </span>

you can see that 3 of the TOP 5 are latch related waits, while the other 2 are related to RAC waits.
If you look at finer waiting data, you can find other problems:

<span style= "FONT-SIZE:12PX;" > AVG%ti Me total Wait Wait Waits Event waits-outs time (s) (ms)/TXN------- -------------------------------------------------------------------latch:cache buffers lru cha 774,812 n/ A 140,185 181 1.9 GC Buffer busy 1,356,786 6 61,708 3.3 latch:o       Bject Queue header o 903,456 N/a 55,089 2.2 Latch:cache buffers chains 360,522 N/A 49,016 136 0.9 GC Current grant busy 112,970-19,893 176 0.3 GCs DRM F      Reeze in Enter serv 38,442 18,537 482 0.1 gc CR block 2-way 1,626,280 0 15,742 3.9 gc remaster 6,741, 12,397 1839 0.0 roW Cache lock 52,143 6 9,834 189 0.1 </span>

from the above data can also be seen, in addition to the top 5 waits, there is "GCs DRM freeze in Enter server Mode" and "GC remaster" These 2 relatively rare wait events, from their name, are obviously related to DRM. So what is the correlation between the 2 wait events and the top 5 event? MOS document "Bug 6960699–" Latch:cache buffers Chains "Contention/ora-481/kjfcdrmrfg:sync Timeout/oeri[kjbldrmrpst:!master" [ID 6960699.8] "mentioned, DRM may indeed cause a lot of" latch:cache buffers Chains "," Latch:object Queue header operation "Wait, although the document does not mention, but does not exclude will cause "Latch:cache buffers LRU chain" such waiting.
to further verify that the performance issue is related to DRM, use the tail-f command to monitor the trace files of the LMD background process. When you start DRM in the trace file, query the V$session view and discover a large number of "Latch:cache buffers Chains", "Latch:object Queue header operation" Wait events, and " GCS DRM freeze in Enter server Mode "and" GC remaster "wait for events while the system load increases and the foreground reflects performance degradation. After the DRM is complete, these waits disappear and the system performance returns to normal. The
seems to be able to avoid this problem by just turning off DRM. How do I close/disable DRM? Many MOS documents refer to a method that sets 2 implied parameters:

<span style= "FONT-SIZE:12PX;" >_gc_affinity_time=0  _gc_undo_affinity=false  </span>

Unfortunately, these 2 parameters are static parameters, which means that the instance must be restarted to take effect.
You can actually set 2 additional dynamic implicit parameters to achieve this. After setting these 2 parameters by the following values, DRM cannot be completely banned/closed, but DRM is turned off from "de facto".

<span style= "FONT-SIZE:12PX;" >_gc_affinity_limit=250  _gc_affinity_minimum=10485760  </span>

You can even set the above 2 parameter values to a larger size. These 2 parameters are effective immediately, after setting these 2 parameters on all nodes, the system is no longer DRM, often for a period of time observation, the performance problems described in this article no longer appear.
The following is the wait event data after the DRM is turned off:

<span style= "FONT-SIZE:12PX;"                                        >top 5 Timed Events AVG%total ~~~~~~~~~~~~~~~~~~ Wait Call Event Waits time (s) (ms) time wait Class----------------          -----------------------------------------------------------CPU Time 15,684                   67.5 db file sequential read 1,138,905 5,212 5 22.4 User I/o GC CR block 2-way    780,224 285 0 1.2 Cluster log file sync 246,580 246 1 1.1 Commit sql*net more data from client 296,657 236 1 1.0 Network-------------                                                                                ------------------------------------------------                 AVG%time Total Wait Wait Waits Event                Waits-outs time (s) (ms)/txn-----------------------------------------------------------                   ---------------DB file sequential read 1,138,905 N/a 5,212 5 3.8 GC CR block 2-way       780,224 N/A 285 0 2.6 log file Sync 246,580 0 246 1 0.8 sql*net More data from Clien 296,657 N/a 236 1 1.0 sql*net Message from Dbl       Ink 98,833 N/a 218 2 0.3 GC current block 2-way 593,133 N/a 218               0 2.0 GC CR Grant 2-way 530,507 N/a 154 0 1.8 db file scattered read      54,446 N/A 151 3 0.2 kst:async disk IO 6,502 N/a 107 0.0 GC CR multi block request 601,927 N/A 0 2.0 sql*net more data to Clien T 1,336,225 N/A         0 4.5 log file parallel write 306,331 N/A 0 1.0 GC current          Block busy 6,298 N/a 0.0 backup:sbtwrite2 4,076 N/A 0.0 GC Buffer Busy 17,677 1 3 0.1 gc Current          Grant busy 75,075 N/a 1 0.3 direct path read 49,246 N/A 1 0.2 </span>

understands that: DRM (Dynamic Resource Management) theoretically implements the promotion of a non-master node to the master node, Can reduce cross-node resource access, but it brings more problems. If there are two nodes in a RAC cluster, Node 2 caches a large and large table during the idle period, and when the business is busy, node 1 needs access to the table, and if there is no DRM, it is accessed from the store, but if there is DRM, the cache resource is found in Node 2. This resource is passed to Node 1 from the Cache in Node 2, which consumes a lot of bandwidth and consumes many resources.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding of DRM in Oracle 10g RAC

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Understanding of DRM in Oracle 10g RAC

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support