Write in front
Remember when you learn the database knowledge in particular like to see cases, because the optimization of the means is easy to grasp, but the overall optimization of the idea is difficult to learn. This is why I particularly like to see cases, today also share their own optimization cases.
Before sharing the OA system, his system, today we come to one of the most common erp,erp systems in all walks of life in use, different industries have different characteristics, Bo Master in the development of the time also wrote the ERP is more familiar with.
Whether this article shares the retail category, or shoe clothing stores, home, cars, real estate and so on, whether it is a friend, a disc, ERP has a common characteristics, document process, business complex, hot table obvious, large data volume, involving many system interfaces, a variety of Big data statistics report .... The traditional industry lacks the DBA's careful management.
Slow is universal!
Recently has been very busy, the blog output is poor, today collated a bit of their own optimization or a variety of solutions to customers have more than thousands, involving all walks of life, sharing the case today is more typical of these customers! Nothing is a common problem on the tall! In the previous blog has been mentioned, then this article we will combine the previous technical point to look at this case. Crossing of learning optimization tools can be found in my Optimization series: SQL Server fully optimized-------Expert for SQL Server Diagnostic series
--------------Blog Address---------------------------------------------------------------------------------------
Expert Diagnostic Optimization Series http://www.cnblogs.com/double-K/
No more nonsense, just open the whole-----------------------------------------------------------------------------------------.
User behavior
The system is slow! Save a document for several minutes, a lot of operations are timed out, especially to around 4 o'clock in the afternoon all kinds of timeouts, receipts or anything can not be received,
Check a report one hours, work has not been checked out, often because the system is slow and overtime,
The business has complained that this has been reported to the company's high-level IT department pressure is very big!
System Environment
First we look at the system configuration and the status quo, why say this customer classic? Look down and you know ...
Let's take a look at the system configuration:
The server configuration is: 8 x Core made Hyper-Threading 384 logical CPUs, memory 1T, disk full flash
SQL uses version 2012, patches are up-to-date, and server configuration all recognize
That's right. Pretty cool to configure!
The size of the database is 1.2 t
I see perhaps the data volume is too large, leading to performance problems! But another thought so powerful server is not so slow ah, is it a code problem? Do you need a sub-table?
Database metrics
So let's look at some of the database's appearances:
Number of requests per second:
Number of user connections:
Statement execution Status:
Wait Condition:
Wait Time:
CPU metrics:
Some indicators of memory:
Disk Queue:
-------------------Many indicators do not show the------------------
see these basic indicators, besides being slow what can you see? Where is the problem? How to solve quickly? Can an optimized step be presented in front of you?
Analysis
The system is really slow, the number of slow statements a lot of system congestion is also very serious, and the customer reflects the slow can match. Then why is it so slow? What causes it?
I conclude that general performance is slow and is related to 6 major factors:
- Business pressure
- Hardware
- Environment
- Code
- Operational factors within the database
- Architecture
A sketch.
System pressure: Access pressure (also we often say concurrency) is not large, the number of user connections is not as much as expected
Hardware: There is a real pressure on memory and disk IO
Environment: Server and database version of what is not a problem, specific configuration for a while to see.
Code: Most do not want to analyze the code, we stay to the last
Database internal operational factors: from various indicators to analyze, the system statement waiting time is too long, resulting in the statement completion slow, and wait for the main two parts:
- Hardware resources are really stressful
- The block before the statement is too serious, "lck_m_", and wait too long, unexpectedly averaged hundreds of seconds
Re-analysis ... Such a strong hardware, and little access to pressure, unexpectedly caused the bottleneck? The statement is bad? is the program implemented poorly? Missing index? The environment is not configured?
Now let's take a look at ....
Optimization phase one (general optimization)
Many times the system is slow to investigate the reason, is it so slow on the line? That's impossible, the vendors simply can't deliver! So the question is, when does it start to slow down? What adjustments have been made to the system?
A simple survey begins ...
Holy shit!!! Manufacturers do not cooperate, engineers on the system and its unfamiliar, ask Sanbuzhi, recently do what changes are not known, users do not know. Manufacturers to the conclusion: continue to add hardware .... Stronger IO .... Data separation reduces the amount of data!
Coordination of the Company fully coordinated, the basic No!
Since it is a database problem, let's start with the database! From a database practitioners, see such a system must first solve the large area waiting for the problem! Personal experience to see a lot of system large area waiting for the solution system will have a great upgrade and improvement!
With some conventional tuning means the stage begins, mainly to create large areas of the system to affect the high cost of the index, adjust the system parameters, optimize tempdb and so on .... Specifically not detailed, the previous series of articles are!
Expected:
General system above a round of optimization will have a noticeable improvement, I think this round after the system will be significantly faster, the statement operating environment appropriate, index what reasonable resource consumption of natural less, memory and IO pressure will also be reduced.
Results:
system memory, IO pressure tends to smooth, slow statement number decreased, but still many, blocking still exist, more than 2 minutes of the statement still a lot.
Before optimization
After optimization
Before optimization
After optimization
Optimization Phase II (for statements)
Analysis of the system to solve the large-area sentence block again, found that the current situation, mainly the following several:
- Memory is still volatile at some point, but overall IO memory is no longer a bottleneck.
- The program with sleeping in the system is blocked for a long time
- Some feature statements are still slow and consume a high level of resources.
Again to the system research:
- What business is the slow statement of execution, the business function? Or a report? Or an interface?
- Frequent and slow statements in the system.
- What action is blocked in the system.
After the investigation, I encountered the most common is the biggest problem: slow sentence due to the program! In his optimization case, it is because the program uses a lot of custom functions, we can not change, we skillfully bypass. So how do we get around this time?
One: report
In the analysis, it is the report that consumes the most resources in the program system.
A report is inserted into a physical temporary table through a series of complex queries, what is a physical temp table? It's a non-#temp, but a real one. Inserted into the table, run out of delete!
Insert in Delete, intermediate and business table associated operations, resulting in the report will also block the business!
What is the amount of data inserted for deletion? What do you think??
Tens don't ....
Two: interface
Frequent calls to business data in interface programs concurrency update frequently .... Cause business to be blocked ...
Three: Problem code
There are two main problems with the code:
1. The code is more complex and requires careful optimization.
2. There is a connection leak in the program, simple understanding of the program error after the transaction can not be effectively processed, causing the transaction did not commit the blocking system
For the first part of the report, the statement is more complex ... This thing is not a short-term optimization, consider to split out
For the second part of the interface, modify the interface view, including optimization, add index, call frequency, etc.
For the third part of the business statement for meticulous optimization, query hints, Planning wizards, re-compilation and so on means ...
Optimization Phase III (report separation)
After the first two phases of the optimization of the general system will be significantly improved, only the report has not been processed, and a portion of the high consumption of frequent interface queries, which we use report separation method to solve.
Here we encounter a problem, the report to write a physical table! AlwaysOn with 2012 comes with no way to implement (secondary node is readable only)
The customer is not satisfied with the publication subscription, which does not meet the requirements of data security and business continuity at the same time.
Do we think we can write a physical table into a #temp temp table? Software vendors have concluded that: Impossible ....
That's where we use third-party products Moebius clusters (it's really not advertising ....) )
How to achieve:
Multi-live cluster, several node data in real-time consistency, such a basic knowledge is not popular ... The introduction of the cluster is also exempt
First of all, only one connection string can not point to the report to the secondary server, we only through the Moebius cluster of the front-end scheduling engine, custom rules to use the report stored procedures point to the second server, solve the problem of the program can not be separated.
Second, the Moebius cluster can implement two nodes can be written to meet the needs of the Auxiliary node report query to write to the physical table.
The amount of time to write the temporary table is too large, tens data synchronization is also a problem, here is good fortunately, the program written in the physical temporary table is "TEMP_" beginning with the GUID type end. We set up here as long as such table write does not reverse the synchronization to the main node, so according to the rules control bidirectional synchronization to meet the requirements of the report, the final realization of the separation of the report.
The report is fast? Of course not, but the separation can not be fast, but the benefits of two:
- OLAP and OLTP separation transaction blocking is resolved
- Report servers and business servers can be individually personalized according to their business
- According to the requirements of the report, we configure the hardware of high speed IO
Expected:
Statement has been optimized, blocking situation is also resolved, CPU, memory, disk pressure is not, the system must be up quickly!
Results:
The system is getting up!
End-of-Business system node 24-hour Slow statement number: (although there are slow statements exist, after all, the terabytes of data volume, does not affect business operations customers are fully acceptable!) )
--------------Blog Address---------------------------------------------------------------------------------------
Expert Diagnostic Optimization Series http://www.cnblogs.com/double-K/
-----------------------------------------------------------------------------------------------------
Summary: The system is slow often we want to comprehensively analyze the dimensions provided in this article:
- Business pressure
- Hardware
- Environment
- Code
- Operational factors within the database
- Architecture
Often optimization is really not a simple tune statement, add a hardware, comprehensive analysis is the primary task of solving performance problems.
Of course not all the optimization can be completely solved, such as the improvement of the report in this paper is achieved through the separation of Read and write, many times in the ERP system report processing is so, if the report is carefully optimized, how long it takes! Maybe it's all rewritten.
The optimization process of this paper is: Comprehensive analysis of system problems--〉 macro-level solution (environment, database internal operational factors, hardware pressure)--〉 inefficient code adjustment--〉 Architecture Scheme Realization (stable, safe, efficient)--〉 final system smooth pressure-free
Of course, the customer's data in this case has reached the stage of data separation, partition table, but the reason to share this case is also, do not think that the TB of data must be divided into a variety of sub-database split, in the performance tuning of the simple pay can still reap greater gains, sincerely hope that crossing in the choice Sub-database sub-table pay a great price before you can find professional people comprehensive analysis, carefully evaluate your system what is the bottleneck!
Database optimization Case —————— A well-known retail enterprise ERP system