Oracle Database Center Dual Live path: ASM vs VPLEX

Source: Internet
Author: User
Tags ack benchmark dba switches

Comparison of two-live solutions: ASM vs V-plex

Wang Wenjie

Oracle Company Principle System Analyst

Oracle Advanced Services

The evolution of Oracle Database Center disaster Recovery has evolved over the years from initial cold backup, to hot backup, to storage replication, to Dg,adg,rac one node, RAC, and eventually to the hottest dual-live dual-center architecture, This is the remote RAC (Extended RAC) we often call.

General pre-Sales engineers to realize the two-live scheme there are many, but I think the real rto,rpo approach to 0, and the dual center available (read and write) of the scheme, can be called the real double-live dual center. Replication software can not be counted as two live, DG/ADG is not dual live, standby library is not writable, switching for a long time. The truly mature bi-live case is Oracle ASM host Mirror (volume management), Storage virtualization Solution Vplex (Oracle Certified) and IBM SVC Scheme (Oracle Certification), and other vendor solutions such as HDS,HP, Huawei's solutions.

Photo Source (Extended RAC white paper)

In the last 3 years, the author has participated in the database implementation and operation of Vplex bi-active and ASM double-live, and has a practical experience to talk about the two current dual-activity programs.

First, we talk about the main differences between extended RAC and common RAC. Mainly distance, distance is to prevent such as natural disasters, terrorist attacks and other data center damage, another database center can still provide services. However, the distance will bring additional network and IO delay, so the construction of double-live network is particularly important. Mentioned in Oracle's HA Best Practice Documentation:

The average high-quality network can be used for distances of less than 10 km.

Distance equal to or more than 10 km required dense wavelength division equipment
and multiplexing (DWDM) devices. If DWDM or CWDM is used with these, it should be connected directly using a dedicated switch.

PS:DWDM(that is, dense wavelength division multiplexing) is the preferred technology in today's optical fiber applications, but its expensive price has made many of the less well-off operators quite hesitant. Is it possible to enjoy wavelength division multiplexing at a lower cost? In the face of this need,CWDM(sparse Wavelength division multiplexing) was born. DWDM(i.e. dense wavelength division multiplexing) andCWDM(sparse wavelength division multiplexing) literally we can see the difference between the two: dense and sparse second,CWDMThe modulated laser uses a non-cooling laser, whileDWDMthe cooling laser is used. The cooling laser is tuned with temperature. CWDMavoiding the difficulty of temperature tuning actually reduces the cost significantly,

CWDM costs only DWDM1/3 , CWDM very popular

Distances from 10 to 50 km require storage network (SAN) caching to counteract the performance impact due to distance. Otherwise, performance degradation can be noticeable.

For distances over 50 km, any architecture requires rigorous performance testing to prove that its performance is acceptable. Both performance tests and stress tests are required for the actual two-way work at any distance.

Domestic double live the most common scenario in recent years is EMC's Vplex solution, and the use of Oracle ASM host mirror.

For the comparison of these two scenarios:

VPLEX

Asm

The two program authors participate in the most EMC scenarios, typically from program audits, installation,testing,go-live support, operations, optimization, and participation, while the ASM program is more involved in the overall crosscheck,trouble Shooting, optimize the adjustment and so on. In the following, we discuss the individual sentiment of these two kinds of programs.

The greatest advantage of Vplex is that Vplex greatly simplifies DBA work without configuring failure group, Path preference, third-party quorum disks. DBAs do not have to participate in managing the storage, Vplex handles everything, the DBA does not see the disk, and the Vplex engineers handle everything, reducing the risk of hardware failure leading to database crash. Database performance, Vplex All reads are local read, no additional settings. Read can benefit from Vplex's cache. More importantly, all IO distribution and replication is managed in the Vplex cluster, and the workload of IO work moves from the compute node down to the storage cluster level. Alleviate the resource pressure of the database node.

As for the shortcomings of Vplex, there are two points which need not be questioned. The first vendor binding, the actual issue has been weakened a lot, to a large extent is the CTO's style has a relationship, the second price, this indeed Vplex first price is not cheap. A lot of people want to ask, so leave the vplex we can do it all. The answer is yes. The premise is that you have good architects, storage administrators and DBAs, in the storage, network, database configuration, do not make any mistakes, do a good job of architecture design, high-availability testing, performance testing, stress testing (stability) and so on. Simply speaking, using asmmirroring, you face the following challenges

1) because it is ASM to manage the distribution and replication of IO, hostmirroring consumes resources on a certain database node.

2) Storage administrators and DBAs need to carefully collaborate, configure failuregroups, path preferences, third-party quorum disks, and need to remain vigilant for a long time. A faulty configuration can lead to disaster recovery or degraded performance.

3) DBAs need to intervene to get through the configuration of the sophomore San Link and perform complete network testing. Focus on the delay caused by the remote network, the quality of the heartbeat. High-availability testing also requires complex designs that cover a variety of fault scenarios and complexities.

4) DBAs need to involve complex performance testing, stability testing, and disaster testing to fully test and understand the performance of IO under this architecture. For example, local failgroup and remote Failgroup single-reading test, double failgroup simultaneous online double-live test, and its comparison, mainly focus on physical reading and writing tests (such as sequential read, hash read, direct path to read, direct path to write, parallel write, control file read, Log file write, etc.). Raccache Fusion test, interconnect mass at maximum flow, GC swap rate, gccr/current block delay. Many domestic customers choose ASM Reason, because the budget, often using old hardware to implement, often RAC cluster, node configuration is not the same, such as server model, CPU cores/frequency differences, etc., so more need to test to verify the robustness of the scheme.

5) on-line maintenance is more complex and requires higher DBA requirements. Later, if you need to scale out horizontally, you have the same complexity and need the same human resources configuration.

If the ASM solution is the natural advantage, I think that ASM does not need to synchronize the data through the San Network (third-party storage cluster software still needs heartbeat, generally deployed on database nodes, such as Veritas CVM), ASM IO Replication is initiated from the database node. Vplex is required to synchronize data through the San Heartbeat network, and if the San heartbeat network is disconnected, Vplex one site will shutdown, and the database will crash, and the main opportunity reboot. But if the storage disk on one side of the site is purely corrupt, neither ASM nor Vplex will affect all DB instances. After the disk is repaired, the data is resynchronized. ASM introduced the Fast Resync option feature from 11.2, which allows fast recovery of disk synchronization.

I've seen a customer with an ASM solution, rushed online, due to the serious performance of not expected. Even after the online, it is only starting to consider setting the asm_preferred_read_failure_groups the most basic double-live parameters. Finally have to purchase hardware again, and finally even replace the original hardware. The price of the pay is staggering. There are also some customers using ASM dual live systems, the load is actually very low, remote nodes are closed all year round.

Some of the issues mentioned on Vplex's other networks are:

First performance, dual-live performance, the main focus on what: IO and heartbeat, if the IO delay and heartbeat delay are all good performance, then the success of the double live more than half. Some manufacturers say the Vplex Mtero will add an additional IO delay, which is not recommended, and this test is said to be obtained through DD. I think this is really too hasty, the database of an IO in a dual-live architecture from the launch to the end of the life cycle is not a DD can be simulated. The IO distribution of the ASM layer also has additional overhead, which is in the compute node, while the Vplex is off loading to the storage layer.

The process of the VPLEX Metro's write operation is that, first, the director determines which data block needs to be modified, and also notifies other directors in the Metro-plex, so that the local and remote that owns the block in the cache The Director updates the respective dictionary copy to show that the data in its cache is out of date, writes the data to the cache, writes it to the disk, and then returns an ACK message from the two site to the director that issued the write, and then the ACK is returned to the compute node.

The process of the VPLEX Metro read operation is that the host sends the read request to the directors in Metro-plex, and the local director checks to see if there is any data in the cache, and if it returns data, it is stored locally from the backend to read the data block.

EMC's feature is called: EMC Vplex Distributed Cache consistency

ASM's IO management is simpler than Vplex, because ASM is just a volume management software. ASM earlier in the absence of parameters asm_preferred_read_failure_groups can specify the first read the Fail group, read performance is undoubtedly a large loss, in the set up asm_preferred_read_failure_ After the groups parameter, ASM read IO also goes local (provided that it is configured properly). ASM's write IO requires both local failgroup and remote Failgroup to be written successfully, and Io is considered complete. If an failgroup io write fails, Oracle will again look for a new extent write, and if it fails again, it will offline the disk.

Here are the performance sampling scenarios for some of the main dual-live customers:

A customer Vplex dual live, distance of 25 km, storage with flash disk, the most recent IO situation. You can see that the data is the main waiting event db file sequential read basic stability maintained around 1ms. The db file scatter read is around 1.2ms. Log file sync is around 4.5ms. Excellent performance. The customer database node 1, the daily situation, the average number of transactions per second is about 1500 per second, SQL execution is about 10000, logical read per second is about 2 million. The number of transactions will increase by 3 times times. Interconnect traffic bytes about 97MB per second, GC BLOCK lost almost 0, each GC indicator is performing well. The heart rate here is actually very large, we are very rare in the normal RAC GC bytes more than 50MB, the customer's GC wait has accounted for about 40% db time, but the business side can still accept the current performance situation. In addition, the customer on-line before the database performance testing and stress testing have encountered a variety of problems, different multi-channel software performance differences, network instability led to high GC swap, after 2 months or so gradually adjusted, reached a better state.

A customer Vplex dual live, 4 sets of library 16 nodes, a total of 2 switches, 8 bare fiber, 35 kilometers away, storage without flash disk. It can be seen that the customer's IO latency Most of the time in the Oracle recommended normal delay interval, the database most important waiting for the event DB file sequential read has been maintained around 5ms, the customer GC traffic is currently less. One of the interval IO latencies is high due to backup initiation. Logic reads at about 1.2 million per second. The customer's architecture, the existence of fiber-sharing, several libraries can affect each other, after the daily CRM backup initiated, 4 chaps IO simultaneously degraded. In the settlement period, the number of transactions per second to 4000 per second (in this case, the database user commits), will cause great pressure on the redo log, but the log file Sync indicator still maintained at around <5ms good level. Before the online, using database Expert test software otest conducted a complete performance benchmark test and maximum pressure test, found in the test database bug caused instance crash problem, network instability swap problem, after each resolution, and then on-line to ensure the performance and stability of the database under the dual live framework.

The customer's heartbeat flow is around 40M per second, while some GC BLOCK lost occur

A customer's Vplex dual live, with flash disk, 4 node RAC, main business at 1 nodes, estd interconnect traffic about 5M per second, smaller. Main IO Wait event, good performance. The customer is actually 3, and 4 nodes are in idle state. High degree of hardware redundancy.

AVG

%timetotal Wait Wait Waits% DB

Event waits-outs Time (s) (ms)/TXN time

-------------------------------------- ----- ---------- ------- -------- ------

Log filesync 6,488,667 0 41,258 6 1.0 34.0

DB filesequential Read 11,293,347 0 15,307 1 1.7 12.6S

Directpath Read 65,267 0 1,336 20 0.0 1.1

Event waits-outs Time (s) (ms)/TXN time

-------------------------------------- ----- ---------- ------- -------- ------

DB filesequential Read 14,988,301 0 15,358 1 6.1 22.3

Log filesync 1,660,630 0 4,488 3 0.7 6.5

DB filescattered Read 437,281 0 916 2 0.2 1.3

The performance of a customer's ASM dual-Live program, the database is 2 node RAC, two sets of storage in the distance of 10 kilometers of two rooms, the first phase of the implementation process due to cost savings, the use of old storage and link, before the launch of a large number of disk timeouts, The remote site's failgroup disk was dropped by offline drop by half, and the performance test found that the IO latency was high. This stage of the problem, before the launch was resolved one by one. On-line, database performance is acceptable, the database is the main waiting event db file sequential read has been maintained around 10ms, but once the settlement date, or customer activity day, IO performance is degraded to unacceptable extent, DB file sequential Read will downgrade to around 80ms. Subsequently, the customer's large-scale optimization, the effect is not obvious. The performance issue is temporarily resolved after the final customer has completed a full set of new storage (with a flash drive).

Another early customer of the 10g ASM extended RAC scenario, I/O performance in general, in order to reduce the delay due to cluster interconnect, the remote node is always off. Storage dual Live, database cluster is actually single live.

In general, given the complexity of the dual-living architecture, both the Vplex and ASM Solutions have encountered various problems.

The second episode of Group Struggle. VPLEX cluster, like Oracle cluster, is expelled from the brain after a cluster interconnect communication failure occurs. In this case, if the Vplex cluster and Oracle Cluster interconnect are disconnected at the same time, synchronization is required to prevent the entire cluster from being unavailable. Oracle's brain crack expulsion algorithm, simply speaking, is that the Oracle's brain crack algorithm is simply to retain the subset groups with the most nodes, and if the number of nodes is the same, the subset group with the smaller instance numbers is retained. Note: Starting with 12.1.0.2 Oracle Cluster introduces the concept of cluster weights, and the subset groups with high weights will survive.

A. The group with more cluster nodessurvive
B. The group with lower node member in case of same number of node (s) AvailableIn each group
C. Some improvement have been made to ensure node (s) with lower load survive incase the eviction are caused by high system L Oad.

D. For 12c, node (s) with more weight would survive, see Note 1951726.1 12c:which node would survive when Split Brain Takes Place[this section was not visible to customers.]

Vplex cluster's architecture is similar to RAC, there is also the heartbeat network and the president mechanism, see

Witness is the function of the vote disk of the type RAC. In the Vplex cluster, one of the 2 cluster is preferred (priority) and the other is non-preferred, if the heartbeat connection between the clusters is completely interrupted, witness will notify preferred cluster to continue service, and non-preferred stop service until inline recovery, which I get from EMC engineers, is that EMC is also a node with a small number of reserved nodes. This is basically consistent with the behavior of the 11.2.0.4 Oracle cluster. Currently Vplex RAC is generally 2 nodes, or 4 nodes, so that the database heartbeat and storage heartbeat at the same time disconnect, the expulsion is consistent. In the network design phase, it is best to have a San heartbeat and a database heartbeat separate switches and fibers, and redundant mechanisms.

In short, Vplex is not afraid of a site storage system crashes, the DB instance continues to access the local cache and Remote Storage system, and is not afraid of a site crash (the site database synchronization crashes), However, another site's storage system and database continue to serve.

The only thing that has a certain concern is that the storage and database heartbeat are disconnected at the same time, the risk of this small probability theoretically exists, if the cluster node is more, the heartbeat network is disconnected after the formation of a subset of the group is located in the Vplex kicked out site,oracle version >= 12.1.0.2, with a dedicated node weight, there may be a situation where there is a rift in the brain.

Therefore, the first pair of live construction, the storage of the heartbeat network configuration is very important to high availability for the first consideration. The second such eviction scenario, which must be included in a highly available test for repeated verification, further reduces the risk.

Third locking problem, the problem is that Vplex one site storage corruption, the other site will have a short time (5 seconds) lock, resulting in the business can not be successfully taken over. This problem is actually very good understanding, any cluster after a certain site crashes, in order to maintain the integrity of the transaction, database consistency, there will be a lock-in situation, including Oracle cluster: IMR Instance membership recovery or reconfig, The same lock will occur. This is the inevitable situation of cluster architecture at present, but most of the clusters can continue to provide services in completing Reconfig. The problem can be simulated in the test phase, to ensure that hardware resources, middleware, applications can withstand a short period of locking, and then complete the takeover.

Many people like to ask Oracle how to recommend, in fact, this is difficult to have a fixed statement, the first factory is unlikely to recommend non-Oracle products for you details. The second official document, the explanation of this solution is almost few, only a few pages. Earlier versions of the Oracleha documentation, recommended by storage mirror, are considered mirror when storage asmmirror is not possible, and by 11.2, Oracle recommends Hostbased Mirror, With ASM as the cluster logical volume management to 12.2, there is currently no HA Bestpractice documentation published.

If you want to ask me how to recommend? Vplex and ASM are Oracle Certified solutions. Program is dead, talent is the key to project success.

The technical process for a typical bi-active project should include:

I recommend that ORACLEACS experts implement the team to participate in the implementation of Amsmirror-based dual live and storage mirror based on the actual situation of the customer. Both solutions need to be implemented over the General Database center, with more prescriptive and complete database architecture design, installation, configuration, high availability testing, benchmark performance testing, extreme stress testing, and disaster testing (recommended for Oracle testing, optimization, implementation best practices: Otest for database testing, HTTP ://www.dbfine.net/otest) It is clear that the ASM program requires a more professional implementation staff. The West End team is one of the most experienced two-live Oracle implementation teams in the country.

Reference: Oracle extended RAC White Paper

Emc_vplex_overview_and_general_best_practices

Senior expert Sun Jiujiang offers a multi-ground dual Live solution reference

Oracle HA Best Practice Documentation

Article Source: http://www.dbfine.net/archives/480

2017-07-06

Oracle Database Center Dual Live path: ASM vs VPLEX (RPM)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.