Suggestions for RAC Optimization

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although Oracle presales personnel always say that the scalability of RAC is transparent, as long as the application is assigned to different nodes, the system capability can be smoothly expanded.
In fact, the RAC cache fusion mechanism determines that dB cache and library cache will be brought by cache fusion in the RAC environment.
Additional overhead.
In a single instance Environment, if we want to access a cache buffer, we only need to block the related cache buffer chains, and then
Read the buffer. In the RAC environment, first, we need to hold the cache buffer chains to check whether the current version of the buffer is
In the DB cache of this instance, if it is not found, a message is sent asking other nodes whether the cache exists in other nodes. If it exists
The buffer must be transmitted from other nodes over the network.

In a single instance environment, the hit rate of DB cache is 96% and 97%, which may not affect the system much. In the RAC environment, the hit rate of DB cache is decreased by one percentage point, the overall impact on system performance is several times greater. Therefore, it is critical to reduce the negative impact of cache fusion when designing RAC applications. In a RAC environment, it is impossible to share data without any data, and data is shared more or less. Therefore, how to share data between instances is the key to solving the performance problem of RAC applications.

In terms of application design in China, the most widely used method of cache fusion is application isolation, that is, running different applications on different RAC nodes, this minimizes data sharing. Application Isolation is indeed a good RAC application optimization solution, which is simple to implement and does not require very professional RAC optimization technologies. However, although application isolation is relatively difficult to implement and widely used, application isolation is the lowest application level in RAC optimization. In many cases, an application is mainly concentrated on access to several tables, which makes it difficult to isolate the application. In this case, you need to perform more careful Optimization on the application. Just like the case of the old white processing, the two nodes need to run the same application. In this case, the optimization needs to begin from the bottom layer of the application. In general, reduce the cache
There are several ways to influence fusion performance.

1> table partitions are used to restrict a partition from being used by an instance. For RAC applications, Table Partitioning is a good way to reduce hot block contention. First, data is distributed to several segments through Table Partitioning, and its high-level advancing and data access are all dispersed. Therefore, when setting table partitions in the RAC environment, you must note that the gradual selection of table partitions can indeed play a role in data distribution. If we have a log table that partitions Based on the log date, although we have made a partition, it still cannot play a role in distributing data access, because we always insert and query data in the latest partition, other partitions are rarely accessed.

2> increase the hit rate of the database cache. In general OLTP systems, the hit rate of DB cache exceeds 90%, which is not too low. If the hit rate exceeds 95%, the hit rate of DB cache is relatively high. However, in the RAC environment, the lower the hit rate of DB cache, the greater the negative impact of cache fusion. Therefore, in the RAC environment, maintaining a high dB cache hit rate has a huge impact on overall system performance.

3> increase the hit rate of the Shared Pool. In the RAC environment, SQL analysis and other operations are much more expensive than the single-host environment, mainly because analysis and other operations involve coordination of a large number of global resources. It is also important to reduce hard analysis and soft Analysis in the RAC environment. Therefore, maintaining sufficient shared pool resources, good programming habits, and rational use of Bound variables can help improve system performance in the RAC environment.

4> increase the sequence cache and use the noorder option. In RAC, the SQ lock wait is often encountered because sequence also becomes global in the RAC environment. Different nodes need to generate serial numbers to compete for sequence resources. Currently, most sequence in most systems are used as primary key generators and are frequently used. In the RAC environment, a large sequence cache is required, otherwise, it will cause serious competition, thus affecting the business.

In addition, when the business permits, sequence tries its best to use the noorder option to reduce the burden on the sequence generator. For extremely serious sequence contention, even some users use UUID to replace sequence. UUID is a 37-bit string, and the global uniqueness is ensured through the generation mechanism. This feature can also be used as a primary key. In Oracle, you can use select sys_guid () from dual to generate a uuid. This UUID does not contain "-", so it is only 32 bits.

|
An example of enqueue
The average transaction response time is about 200 milliseconds, Io is basically idle, and the CPU usage of the two nodes is relatively low, basically stable at around 30%. Global cache Cr request, DB file sequential read, and enqueue are among the top three waiting events. Enqueue waits about 13% of the total waiting events. I immediately opened the enqueue chart and found that the top enqueue is sq. Run the script to check v $ enqueue_stat:

Select eq_type, cum_wait_time
From v $ enqueue_stat
Order by 2;
From the query results, the top three are SQ, US, and TX. The top priority of SQ is a serious conflict in sequence. I made a statspack report. From the statspack report, we can see that there are still a lot of waits related to row cache objects. From the perspective of row cache, the waiting and loss rate of dc_sequence are also relatively high. So I checked the sequence parameters. All sequence cache parameters are 20 by default. So I asked Qin to contact the developers as soon as possible to recreate all sequence and increase the sequence cache parameters to 1000.

As the system load increases, the CPU usage is also increased by more than 40%. The Global cache-related wait events and enqueue wait events are also a little more than just now. Suddenly the entire system encountered an exception, and the average transaction response time experienced a mutation, increasing more than doubled, as if there was a short hang phenomenon. I looked at the Health icons of the OEM and found that the logon metrics were not normally high. The average number of logons per second on the two nodes reached over 30. I asked Qin Gong if the system has a transient connection application. Qin said that the vast majority of their systems are in the C/S structure, and the average number of logons per second has always been quite high.

It seems that I have ignored another problem. In the RAC environment, if there are many logons, we need to increase the sequence cache. Just now, the development vendor has increased the sequence cache for all applications, however, sys. the cache of audseq $ has forgotten to be adjusted. Check whether sys. the cache of audseq $ is the default 20. in the RAC environment, this sequence must be adjusted if logon is frequent. I have even encountered the issue of hang in the entire database due to the high logon density. So I suggest you adjust the cache of SYS. audseq $ to 5000

Sequence Cache
1. Create Sequence
First, you must have the create sequence or create any sequence permission,
Create sequence emp_sequence
Increment by 1 -- add several
Start with 1 -- count from 1
Nomaxvalue -- do not set the maximum value
Nocycle -- always accumulate without repeating
Cache 10;
|

5> Use read-only tablespace. Read-Only tablespaces are very useful in RAC. In cache fusion, if a tablespace is read-only, data access in the tablespace only requires local operations, collaboration between RAC is not required. In fact, data in many databases will have a large amount of read-only data in a certain period of time. If the data is frequently accessed,
In RAC, it will have a serious impact on ges and GCS. In fact, if we fully consider the RAC issue when designing the application system, we can put the data that may be read-only or read-only access within a certain period of time in the same tablespace, the data can be set to read/write or read-only based on business needs. For example, laobai once optimized a customer's system for frequent access to historical data. laobai suggested that they store historical data in a separate tablespace and set it to read only at ordinary times, read Write is changed back only when data is archived. This simple Optimization Operation reduces a large amount of global access and improves RAC performance.

6> reduce full table scanning for large tables. Full table scans, especially full table scans for large tables, have a huge impact on DB cache. Because full table scans consume a large amount of DB cache, in this way, a lot of datablocks will be squeezed out from the database cache, and the full table scan of large tables will bring about a lot of physical reads. In the RAC environment, the overhead of physical reads is much larger than that in a single machine environment.

7> restrict concurrent queries within the instance range. do not perform parallel queries between RAC instances. In fact, the vast majority of parallel queries do not need to run across instances, and Oracle uses parallel queries across instances by default. Therefore, many customers have suffered this pain point, cross-instance parallel query mainly solves the problem of insufficient single-host CPU capacity, but cross-instance parallel query will cause the problem of GCS. At present, the single-host CPU capacity of domestic user systems is generally relatively strong, so cross-instance parallel query is often not worth the candle. Oracle has flexible configurations in parallel query under RAC. By setting instance_groups and parallel_instance_group parameters, You can implement flexible policies by configuring these two parameters, you can easily limit parallel queries to the local machine.

|
Case
In the monitoring process, we found a large number of PX wait events, because there are a large number of statistical modules in this system, these modules have done a lot of PX wait events, this is because this system
There are a large number of statistical modules. These modules perform a large number of full table scans. To improve the performance of full table scans, parallel prompts are added to many SQL statements. I checked the parallel query,
Multiple cross-instance parallel queries are found. To prevent unnecessary problems, we recommend that you restrict parallel queries to instances. This can be achieved through the instance_groups parameter:
Rac1.instance _ groups = Rac1, RAC
Rac1.parallel _ instance_group = Rac1
Rac2.parallel _ instance_group = rac2
Rac2.instance _ groups = rac2, RAC
With these parameter settings, in the default environment, parallel queries can only be performed within the instance. Because the default parallel_instance_group of each instance is the Instance name, only the instance
The instance_group parameter of contains the instance_group with the same name as the instance. If you want an SQL statement to perform parallel queries across instances, you can use the following method:
Alter session set parallel_instance_group = RAC;
Select...
Because both instances belong to RAC instance_group, you only need to set parallel_instance_group to "RAC.

In my RAC system, most CPU usage is not high,

8> use application isolation to run different applications on each instance. Running different applications on each node of RAC is a common method in domestic RAC systems. It is usually called application isolation. Application Isolation Room RAC application optimization is the most basic form, but it is also a more effective form. In this mode, multiple applications share non-critical data, and the correlation between core business data is small. By distributing different applications to different instances, ensure that each instance of RAC shares less data, and the access conflict is limited to a controllable range.

9> horizontal data isolation. The Application Isolation Room is a relatively preliminary optimization form. Although it is effective, there are many restrictions. For example, if an application is very large and cannot be undertaken by a single node, it cannot be optimized through application isolation. In this case, the horizontal isolation of data is very effective. Taking the telecom business as an example, if the core business data can be partitioned by the Internet, we can set some Internet applications to connect to instance 1, and some Internet applications to connect to instance 2. Table partitions are used to minimize conflicts between core business (p221 is wrong) data, so as to avoid problems caused by data block contention in RAC ..

|
Case
The first round of testing is coming to an end. I checked the enqueue and found a strange phenomenon. Many TX lock wait requests are 4, which is a typical ITL wait. In the RAC environment, ITL waits far more harmful than normal environments. After checking, we found that all the ITL waits are concentrated on 3-4 tables. These tables are not large in size, all of which are 100-200 MB. However, these tables are greatly modified concurrently. I checked that the inittrans parameters are all default values, which is no wonder. Therefore, Qin and I decided to check all the tables and increase the inittrans parameters of some tables with more concurrent modifications. This operation takes some time, therefore, it cannot be implemented before the second stress test.

What is the actual command?

Drop table Cx. CX cascade constraints;

Create Table Cx. CX
(
Name varchar2 (20 byte ),
Age number
)
Tablespace cxtest
Pctused 0
Pctfree 10
Initrans 5
Maxtrans 255
Storage (
Initial 64 K
Minextents 1
Maxextents Unlimited
Pctincrease 0
Buffer_pool default
)
Logging
Nocompress
Nocache
Noparallel
Monitoring;

Alter table Cx. CX initrans 8;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Suggestions for RAC Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Suggestions for RAC Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support