let me sigh: The DBA is really not a cover
3.3.3 Usage Profiling: Limited
3.4 Diagnosing the problem of simple break
If the system occasionally pauses, slow query, call shadow problem, try not to use the wrong way to solve the problem: the risk is big
3.4.1 Single query problem or service problem
Use show GLOBAL STATUS
High frequency: 1s/execution of the command to get the data, the problem occurs through the counter
Use show processlist "Reference" to show which threads are running
Using the query log
Turn on slow query, set global long_query_time=0, confirm all connection is new (may need to reset all connection to take effect)
Note that the throughput suddenly drops the log of the time period, and the query is written to the slow query log at the completion stage
Good tools with less effort: tcpdump, Pt-query-digest, Percona Server
Understand the problems found
Visualize data: gnuplot/r (drawing tools)
Gnuplot
Install some commands: Common tips Getting Started Tutorial 2 Gnuplot data visualization
Recommendation: Use the first two methods to interactively collect data with low overhead and simple shell scripts or recurring queries
3.4.2 Shop for diagnostic data
Intermittent problems, collect as much data as possible (not just when the problem arises)
Figure out: 1, there is a way to distinguish when the problem occurs: trigger; 2. Tools for collecting diagnostic data
Diagnostic triggers
Error: Collect a lot of diagnostic data in the absence of problems, wasting time (this and the previous, carefully read not contradictory)
Missing: No data is laid out when the problem occurs, missed the opportunity to start collecting before the confirmation trigger can really identify the problem
A good trigger:
Find indicators that compare to normal thresholds
Choose an appropriate threshold: high enough (not triggered when normal), not too high (not missed when the problem occurs)
Recommended tool Pt-stalk "Reference" "2" trigger, set to a condition record configuration the frequency of variable threshold checks to be monitored
What data to collect
Execution Time : Working time and waiting time
Collect all data collected within the required time period
unknown problems occur due to : 1, the server needs to do a lot of work, resulting in a large consumption of cpu;2, waiting for the release of resources
Different methods for collecting diagnostic data to confirm the cause:
1. Analysis Report: Confirm if there is too much work, tools: tcpdump Monitoring TCP traffic mode open/close slow query log
2, wait for analysis: Confirm there is a lot of waiting, GDB stack trace information, show processlist, show InnoDB status watch thread, transaction state
Interpreting result data
Objective: 1, whether the problem really occurred, 2, whether there is a significant jump change
Tools:
Oprofile leverages the performance counters (performance counter) provided at the CPU hardware level to help us find the "culprit" of CPU usage from the process, function, and code levels by counting samples. Instance "Reference"
Opreport command to view CPU usage from the process and function levels, respectively
Samples | %| The percentage of the total number of samples that occurred in the number of samples sampled-----------------------------------------------------The mirror name
The Opannotate command shows the CPU-intensive statistics at the code level
GDB: In Linux application development, the most common debugger is GDB (the object being debugged is an executable file), which can set breakpoints in the program, view variable values, step-by-Step tracking program execution (data, source code), view memory, stack information. These features of the debugger make it easy to find non-grammatical errors that exist in your program. Reference "Reference" syntax and examples
3.4.31 Diagnostic Cases
Intermittent performance issues with MySQL, InnoDB, gnu/linux-related knowledge
Clear: 1, what is the problem, clear description; 2. What has been done to solve the problem?
start : 1, understand the behavior of the server; 2, comb the server's state parameter configuration hardware and software environment (Pt-summary pt-mysql-summary)
Don't be distracted by the various situations that digress too much, and the questions are written on the note, checking for a crossed out
Is it the cause or the result???
The possible reasons why resources become inefficient:
1, excessive use of resources, insufficient balance; 2, the resources are not properly matched; 3, resource damage or failure
3.5 Other Profiling Tools
User_statistics: Some tables measure and audit database activity
Strace: Investigate system invocation situations, use real time, unpredictable, overhead,oprofile use CPU cycles
Summary:
The most effective way to define performance is response time
cannot be effectively optimized without measurement, performance optimization needs to be measured based on high quality, full range and complete response time
The best starting point for measurement is the application, even if the problem is in the underlying database, it is easier to find the problem with good measurement
Most systems cannot be completely measured, and measurements can sometimes have incorrect results, and ways to circumvent the limitations and to be aware of the flaws and uncertainties of the method
Complete measurements generate a large amount of data to be analyzed, so the profiler (the best tool) is needed
Anatomy Report: Summarize information, cover up and discard a lot of details, will not tell you what is missing, can not completely rely on
Two time-consuming operations: work or wait, the almost profiler can only measure the time it takes to work, so waiting to share is sometimes a useful supplement, especially when CPU utilization is low but the work is never done
Optimize and promote the two different things, when the cost of continuing to increase beyond the benefits, should stop optimization
Pay attention to your direct, thinking, decision making as far as possible based on data
In a words: first clarify the problem, choose the right technology, use the tools, be careful, clear logic and stick to it, don't confuse the cause and result, don't make any changes to the system before you identify the problem.
Related articles:
"MySQL Database" chapter II interpretation: MySQL benchmark test
"MySQL Database" chapter III Interpretation: Server performance profiling (top)