Understanding of DRM in oracle 10g RAC, racdrm

Source: Internet
Author: User

Understanding of DRM in oracle 10g RAC, racdrm
Summary of DRM


1. What is DRM?
Dynamic Resource Management (DRM) is a new feature of oracle 10 Gb. in oracle rac environment, ORACLE uses GRD (Global Resource Service) to record Resource information of each node, it is managed through the GCS (Global Cache Service) and GES (Global Enqueue Service) services. Each node in RAC has its own SGA and buffer cache. To ensure the consistency and high performance of cache resources of all nodes. GCS and GES specify an instance of a node in RAC to manage the cache. This node is the Resource Master. When rematering or master node is changed, it only occurs in reconfiguration and is automatically started or shut down in two normal operation instances. If an exception node is thrown out of the cluster. Therefore, when node A is the Master node, that is, the Resource Master, the Resource is in node A until it is reconfigured.


Theoretically, when DRM and non-master nodes have frequent access to required resources, they can be upgraded to master nodes to reduce the need for a large number of subsequent cross-node resource access. For example: when cache resource is frequently accessed by Node B, the resource can be transferred from node A remaster to Node B.

However, as a good RAC application design, access to the same resource from multiple nodes should be avoided. If the same resource accessed is only on one node, for DRM, it does not exist. Second, the DRM process itself consumes resources.


/* The following is an example of the old bear Website: http://www.laoxiong.net/problem-caused-by-drm.html */


In a RAC system, performance problems occur intermittently, but the system automatically returns to normal after a period of time.

From the TOP 5 pending in AWR:

<span style="font-size:12px;">Top 5 Timed Events                                         Avg %Total  ~~~~~~~~~~~~~~~~~~                                        wait   Call  Event                                 Waits    Time (s)   (ms)   Time Wait Class  ------------------------------ ------------ ----------- ------ ------ ----------  latch: cache buffers lru chain      774,812     140,185    181   29.7      Other  gc buffer busy                    1,356,786      61,708     45   13.1    Cluster  latch: object queue header ope      903,456      55,089     61   11.7      Other  latch: cache buffers chains         360,522      49,016    136   10.4 Concurrenc  gc current grant busy               112,970      19,893    176    4.2    Cluster            -------------------------------------------------------------  </span>


As you can see, in TOP 5, three are latch-related waits, while the other two are RAC-related waits.
If you view more detailed waiting data, you can find other problems:
<span style="font-size:12px;">                                                                  Avg                                              %Time  Total Wait    wait     Waits  Event                                 Waits -outs    Time (s)    (ms)      /txn  ---------------------------- -------------- ----- ----------- ------- ---------  latch: cache buffers lru cha        774,812   N/A     140,185     181       1.9  gc buffer busy                    1,356,786     6      61,708      45       3.3  latch: object queue header o        903,456   N/A      55,089      61       2.2  latch: cache buffers chains         360,522   N/A      49,016     136       0.9  gc current grant busy               112,970    25      19,893     176       0.3  gcs drm freeze in enter serv         38,442    97      18,537     482       0.1  gc cr block 2-way                 1,626,280     0      15,742      10       3.9  gc remaster                           6,741    89      12,397    1839       0.0  row cache lock                       52,143     6       9,834     189       0.1  </span>


From the above data, we can see that in addition to the TOP 5 waiting events, there are two rare waiting events: "gcs drm freeze in enter server mode" and "gc remaster, from the perspective of its name, it is obviously related to DRM. Are there any associations between these two wait events and the TOP 5 events ?. MOS documentation "Bug 6960699-" latch: cache buffers chains "contention/ORA-481/kjfcdrmrfg: sync timeout/OERI [kjbldrmrpst :! Master] [ID 6960699.8] "mentioned that DRM may indeed cause a large number of" latch: cache buffers chains "and" latch: object queue header operation "waits, although this is not mentioned in the document, but it cannot be ruled out that it will cause a wait like "latch: cache buffers lru chain.
To further confirm that the performance problem is related to DRM, use the tail-f command to monitor the trace file of the LMD background process. When the DRM is started in the trace file, the v $ session view is queried and a large number of "latch: cache buffers chains" and "latch: object queue header operation" Wait events are found, at the same time, "gcs drm freeze in enter server mode" and "gc remaster" Wait events occur. At the same time, the system load increases, and the front-end reflects the performance decline. After the completion of DRM, these waits disappear and the system performance returns to normal.
It seems that you only need to disable DRM to avoid this problem. How can I disable/disable DRM? The method mentioned in many MOS documents is to set two implicit parameters:

<span style="font-size:12px;">_gc_affinity_time=0  _gc_undo_affinity=FALSE  </span>


Unfortunately, these two parameters are static parameters, that is, the instance must be restarted to take effect.
In fact, you can set two other dynamic implicit parameters to achieve this goal. After the two parameters are set in the values below, the DRM cannot be completely disabled or disabled, but the DRM is disabled from "actually.
<span style="font-size:12px;">_gc_affinity_limit=250  _gc_affinity_minimum=10485760  </span>


You can even set the values of the preceding two parameters to a greater value. These two parameters take effect immediately. After these two parameters are set on all nodes, the system will not proceed with DRM, the performance problems described in this article are no longer displayed.
The following figure shows the waiting event data after the DRM is disabled:

<span style="font-size:12px;">Top 5 Timed Events                                         Avg %Total  ~~~~~~~~~~~~~~~~~~                                        wait   Call  Event                                 Waits    Time (s)   (ms)   Time Wait Class  ------------------------------ ------------ ----------- ------ ------ ----------  CPU time                                         15,684          67.5  db file sequential read           1,138,905       5,212      5   22.4   User I/O  gc cr block 2-way                   780,224         285      0    1.2    Cluster  log file sync                       246,580         246      1    1.1     Commit  SQL*Net more data from client       296,657         236      1    1.0    Network            -------------------------------------------------------------                                                                                Avg                                              %Time  Total Wait    wait     Waits  Event                                 Waits -outs    Time (s)    (ms)      /txn  ---------------------------- -------------- ----- ----------- ------- ---------  db file sequential read           1,138,905   N/A       5,212       5       3.8  gc cr block 2-way                   780,224   N/A         285       0       2.6  log file sync                       246,580     0         246       1       0.8  SQL*Net more data from clien        296,657   N/A         236       1       1.0  SQL*Net message from dblink          98,833   N/A         218       2       0.3  gc current block 2-way              593,133   N/A         218       0       2.0  gc cr grant 2-way                   530,507   N/A         154       0       1.8  db file scattered read               54,446   N/A         151       3       0.2  kst: async disk IO                    6,502   N/A         107      16       0.0  gc cr multi block request           601,927   N/A         105       0       2.0  SQL*Net more data to client       1,336,225   N/A          91       0       4.5  log file parallel write             306,331   N/A          83       0       1.0  gc current block busy                 6,298   N/A          72      11       0.0  Backup: sbtwrite2                     4,076   N/A          63      16       0.0  gc buffer busy                       17,677     1          54       3       0.1  gc current grant busy                75,075   N/A          54       1       0.3  direct path read                     49,246   N/A          38       1       0.2  </span>


Understanding by yourself: DRM (Dynamic Resource Management) theoretically promotes non-master nodes to master nodes, which can reduce cross-node Resource access, but brings more problems. Assume that a rac cluster has two nodes, and node 2 caches a large table in the idle time period. when the service is busy, node 1 needs to access the table. If there is no DRM, it will be accessed from the storage, but with DRM, the cache resource will be found in node 2, and the resource will be uploaded to node 1 from node 2's cache, in this way, a large amount of bandwidth is consumed, which consumes a lot of resources.
Oracle 10g rac Problems

Find out online and use vipca to solve this problem.

When I use VMware 2003 to configure Oracle 10g RAC, both nodes are configured. When I use the CVU tool to check the configuration, an error is returned:

Hey, are you using Windows to test the OracleRac structure?
When reading the log, the password of the administrtator on both sides is different, and the detection fails.
Find the windows rac manual on the Internet, and do it step by step.
In particular, netwok communication settings such as host files
Here is a reference website.
Www.oracle-base.com/..re.php
You need to understand which step is for what, otherwise the problem still does not know why

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.