Database high Availability combat case-------architecture optimization Cool Summer

Last Update:2016-08-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When it comes to high availability, crossing think of a lot of options, perhaps from a personal experience of the system from a single machine into a highly available pain process, perhaps some crossing just built a test on their own virtual machines. Today this article with my own real experience to tell you, no matter how the actual combat and test play is a big difference! Maybe you think it's easy to set up a high-availability solution, configuration is OK, but it's not that easy in a real complex system!

　　The article mainly describes the upgrade and build AlwaysOn high-availability process, the implementation of the main ideas. The article does not set up the steps of the cluster, build steps please learn by yourself.

--------------Blog Address---------------------------------------------------------------------------------------

Original address: http://www.cnblogs.com/double-K/

If there is reprint please keep the original address!

No more nonsense, just open the whole-----------------------------------------------------------------------------------------.

background

　　The customer's existing scenario is a read-write separation scheme built using the publish subscription, and the overall system is well built. is also a very common set of architectures before SQL2012.

The architecture diagram is as follows:

Customer needs: SQL Server R2 upgrade to SQL Server 2014 replaces the existing publication subscription schema with AlwaysOn. Realize local high availability, read/write separation, offsite disaster recovery, etc., and apply some 2014 of new functions, such as memory-optimized table and so on to improve system performance and concurrency capability.

Pre-survey data collection

Early understanding of the system is very important! So how about the system has a preliminary intuitive and detailed understanding of it? Using scripts to collect? This is the time to reflect the professional and collaborative value of the tool. 工欲善其事, its prerequisite!

Identify scenarios

Through the pre-demand analysis, and a preliminary understanding of the customer system structure, we spent nearly a week of time from the complexity of the architecture, ease of use, customer program changes, performance, stability and many other angles to finalize the final plan.

The architecture diagram is as follows:

From the original so complex architecture into such a refreshing architecture, using AlwaysOn instead of complex publish subscriptions, using AlwaysOn read-only nodes to achieve read and write separation, and the use of geo-disaster recovery node to replace the original offsite release database, very good! This is also the user's most preferred architecture, because of the low complexity, relatively stable and easy to maintain. Be careful here! There are pros and cons in everything! To say "but".

However, the cost of upgrading changes is greatly improved!

Why do you say that? Let's keep looking!

Detailed research

Such a complex system in the early stage of detailed research is a long time, several sets of systems not only architecture design is more complex, functional applications, interfaces and more complex! Here are some of the main grooming processes:

Original system structure

We first have a thorough understanding of the design of the original system, the customer has a data center in both places, three sets of systems have a large number of business to use other system data, so here use publish subscription on time to the other system to publish data to a database in the system, and use synonyms to point to the data of the subscription. This structure reduces the performance cost of using a linked server to access across instances and even across rooms! and multi-data subscription to a plurality of read-only nodes, so that the report, interface and other services to read and write separation.

System Object Grooming

Because to do upgrade migration, so the collation of the object is very important work, the omission of business objects may bring irreparable disaster! It may even lead to a rollback of the entire upgrade, schema deployment! Several sets of systems involved in the list of objects is too large, such as account number dozens of, dozens of jobs, hundreds of synonyms, instance-level triggers and so on .....

Server partitioning:

Main Library Objects
Read/write detach individual read-only library objects
Data server configuration objects that are published to other business systems
Other application objects

Object partitioning:

Database account
Linked server
Instance-level triggers
Homework
System parameters
Maintenance plan
Cdc
Bi-related
Synonyms
Assembly
Mail
Operator
Read-only library indexes, views, and other objects
Wait, wait.

Test process Build test environment

All upgrades, high-availability test sessions are essential. The first is the feasibility of the test plan with the business, because as a third-party company can not all the user's application relationship, the system architecture is well-versed, even the customer's own engineers may not do this. The second is whether the test function is abnormal in the new environment. There is also a check-out of the system objects that are collected and migrated. This also can try to ensure that the system on the line when the probability of failure!

　　The test environment is undoubtedly a necessary step for any upgrade, architectural change, and can only be fully tested to be aware of, and thus achieve zero failure on-line.

On-line Walkthrough

On-line walkthrough? What is this thing?

First of all the database operations must determine the time window can be implemented! It is important to ensure that work is done in a fixed time window, so this is the greatest benefit of the on-line walkthrough, and we use the new machine we have prepared to fully simulate all the steps on the go, and to record the time used for each step, the possible risks, the latest completion time, and so on. After the completion of the construction we can use this environment (is completed after the formal environment configuration) for stress testing.

　　On-line walkthrough is a necessary step, but this step depends on the actual situation, such as the way to upgrade, the configuration of the environment. In such a project we did two rounds of on-line walkthrough!

Implementation process to develop performance baselines

Such a big change, what is the performance index of the database at each stage? Here we still use the Expert for SQL Server tool to compare the performance of each phase before and after the implementation, so as not only to monitor the impact of implementation, but also to clearly analyze the performance of each implementation phase Impact!

For each indicator also do a corresponding comparative analysis, the indicators are more than one here, see the Optimization series article:

SQL Server fully optimized-------Expert for SQL Server Diagnostic series Performance optimizations

Here the performance optimization, we mainly for the statement system of some general parameters, slow statements for the first round of optimization! Another important point is to adjust the statements that might slow after upgrading to 2014! What kind of statements might be slower? This one...

The key statement of the system (most frequently performed)
The complex statement
Large area test bar ..... Ha ha haha

　　Why should this be done before the upgrade, rather than after the upgrade, when the system is running and parsing the slow statements? This truth is very simple, if the line to find that if the function of a lot of slow, or slow is the frequent function so the effect is two words "failure." While some crossing know that it is possible to solve this problem by using the t hint or lowering the compatibility level, this is only an extreme approach in a special scenario, not the root of the solution. So if you have the need to upgrade to 2014 , then such an optimization must be done in advance!

Upgrade to 2014

Upgrade the database can be written a few blog, and even write a small book can be! Here are just a brief introduction, and some of the issues to focus attention!

Upgrade method

There are 2 ways to upgrade: In place and side by side, where side by side! In layman's words is to prepare a new server, install the corresponding version of the database, and then restore the data up. Side by side the advantage is that the upgrade will not affect the original environment, even if the failure can also modify the program point back to the original environment!

Upgrade 2014 One of the biggest problems

New feature of 2014 "parameter estimation"! This exciting and distressing new feature will cause many statements to slow down after upgrading to 2014, because the previous optimization phase has focused on this part, so this part of the problem is basically eliminated! But the vicious partition table (more than 200 partitions) still leads to serious performance problems with batch processing!

Cluster construction

Cluster building may not have too much to say, the normal creation of failover cluster, build AlwaysOn, etc., but this is a lot of details, such as the way of arbitration? Virtual IP settings for geo-nodes? The number of nodes and the cooperation of the business? And so on, this is not a detail.

Program modification

This modification of the architecture will inevitably lead to procedural changes, which is why the customer is most inclined to the architecture mentioned earlier, because of the low complexity and greatly increased costs. The affinity in the original system cannot be localized access through the publication subscription, and cannot use a poorly performing linked server. Then there is only one way, that is to modify the program access, simple to understand in the program in their respective databases to identify the corresponding data, and then through the program in-store operation processing.

Detail problem Handling

The overall implementation steps can be said to be the case, but in this overall step is filled with countless details, every detail may determine the feasibility of the program, upgrade, the success or failure of the architecture change. Confined to the space here are just a few possible common questions to explain!

CDC functionality with AlwaysOn: The official documentation says that CDC and AlwaysOn can implement the CDC without interruption, but after a test of the CDC job multiple execution failures after AlwaysOn switching do not run again automatically, CDC's LogReader and publish subscriptions are the same, However, this problem occurs only with CDC jobs when there are no published subscriptions. Workaround: Configure the governance job (tangency switch job control)
Rebuild index operation: Due to configuration of GEO nodes. Log rebuild becomes a problem, the amount of log that is rebuilt in the test is several times the amount of log in a single machine! This causes the offsite log queue to be too long. Workaround: Use manual scripting to split the refinement index rebuild to control the amount of log per day based on queue size and transfer rate.
2014 The following statement is slow: in detail, the 2014 parameter estimation and the 200+ partition table combination produced by the slow-down problem has no answer to date. Now just use some methods to avoid this problem! ( This problem also please meet the friends to give some ideas, thank you )
Write on read replica: Because some report operations use intermediate temporary tables, the temporary table here is not a #temp but a real physical table as a temporary table. Solution: Modify to a temporary table, or create a separate database (not in the availability group), and use synonyms to point to the new library for write operations.

The problems encountered are really various, this is why say when your conventional technical means are mastered, stepping on the pit is your growth!

--------------Blog Address---------------------------------------------------------------------------------------

Original address: http://www.cnblogs.com/double-K/

If there is reprint please keep the original address!

-----------------------------------------------------------------------------------------------------

Summary: The article simply shared a more complex 08 to 14 upgrade and set up high-availability work, real combat projects and their own set of test system is still a big difference. The entire duration of the project lasted 3 months, so this article is just a simple explanation of ideas and steps, in addition to the introduction of several common large pits. The main steps in the project, which the individual considers to be the necessary steps in the construction of the database high availability scheme:

System Background Survey
Business Research, generate first-edition solution
Detailed research, object collation
Test environment Setup
System testing, identifying scenarios
Online walkthrough to determine the time window
Pressure test
Officially on-line
After-line monitoring
Solve problems and develop maintenance plans

This project can be said to be relatively strict compliance with the relevant management standards, in three months of implementation, we uphold the "stability is greater than efficiency" of the idea, the work is refined to every step, each step has detailed instructions, and finally ensure that three sets of system on-line operation 0 Fault!

Article using the Expert for SQL Server tool download link: http://zhuancloud.com/UserAccount/Install

----------------------------------------------------------------------------------------------------

Note: This article is original, welcome reprint, please in the article page obvious location give this article link!
If you think this article is not bad please click on the lower right corner of the recommendation , thank you very much!

Database high Availability combat case-------architecture optimization Cool Summer

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database high Availability combat case-------architecture optimization Cool Summer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database high Availability combat case-------architecture optimization Cool Summer

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support