A summary of experience in Web application performance optimization

Source: Internet
Author: User

Common performance Optimization requirements
      In the performance optimization cases I've experienced, the common problems are starting with:     a) The front desk visits are slow, please help analyze optimization      B) users are not satisfied with performance, No need to resolve the complaint      C) database load is heavy, please help analyze      d) XXX function Open takes 1 minutes, please help analysis. And when I visit this feature, it may return in a few seconds, and if you're lucky enough to find a problem, you might be able to tell you what query criteria you want to choose, the problem can be reproduced, and, of course, another possibility is that he is also a user.       When it comes to these performance optimization requirements, I want to be able to understand the following information to determine the type of problem, and usually, my job is to start with this information does not exist in the case of      a) Systemic issues ? such as CPU utilization, swap utilization, or high IO results in overall performance degradation?      B) functional issues? Overall performance is good, individual function delay is very long      c) emerging problems? Since when did the system change? (Significant increase in upgrade or management resources)      d) irregular problems? Sometimes fast, sometimes slow, no specific rules       What are the metrics for performance? How many seconds, now how many seconds, the target is how many seconds? Only when the above questions are answered accurately can the optimization work begin.       and the method to obtain the above answer is measurement, there are reliable monitoring tools on the user's access delay, the system cpu,io,swap accurate measurement, when the system performance bottlenecks, the system at the time of the state, the state of the database at the time of the timely record, Rely on this data to start optimization.       Case 1, Tianjin customers have complained, the system has a recurring overall performance decline, almost inaccessible situation, and the time has not been any law. I deployed the monitoring tool, analyzed the data after 3 weeks, and discovered that the user performed a historical alert query before the performance of the access dropped, and the database performance curve dropped sharply after that. After the developer has optimized this feature, the problem never comes up.       Case 2, Nanjing customer A host often crashes, according to the monitoring tool before the crash log process columnTable, is a program hangs caused by the process of the more accumulated, resource exhaustion and crashes, after the program improved, the failure has not occurred.       Only so much ink, is to let you understand that there is no quantitative measurement, performance optimization work is completely castles in the castle, can not be carried out. With the deployment and monitoring of the tools, the performance issues we may be proposing in the future are likely to be this way:     a) The overall load of the system is normal, but/nms/res/devicelist_ Down.jsp currently has 35046 milliseconds of access delay, please help to troubleshoot      B) system swap utilization will often exceed 50%, this time the system response is very slow, kill getcgflux.pl back to normal, please analyze      C) or the database server is currently high CPU utilization and has been going on for some time, please analyze if performance optimization works in this way, the work will be much easier and more fun.   Optimization Analysis Process 1. Performance data collection This step is the basis of performance optimization, if the problem system before the deployment of monitoring tools, then the deployment of monitoring tools, collect data for a period of time to start analysis, of course, there are exceptions, fortunately, performance problems are occurring and so significant, such as a program can not hang for a long time, Or a process that eats up the CPU of the entire host, or a feature query is slow, this time.     Of course, this problem rarely need to me, you can directly find developers to solve. In many cases, the problem may not be so obvious, it is not so regular, it may also involve multiple functions of the system, and at this time, we have to use tools to collect data. 2. Performance data analysis If there is no data collection, the analysis work may be mysterious, relying entirely on the expert's personal experience.     Once heard a story, a factory printer is always inexplicably in a time of failure, then invited an expert, moved a chair near the printer, a few weeks later, called the workers to the floor of a corner repaired, and then the problem never appeared. But with the previously collected data, observing Cpu,io, application delay, network performance and other indicators of the curve, observing the problem occurred in the point of time, the function of problems, any IT industry, should have the ability to identify problems from these data. For example, the syslog processing of a collection machine often occurs in a lag situation. And this machine's network drops are like this, then the problem is not obvious? Another example has a patrol function, data a period of time is always a large number of storage failures, then you see the database CPU and connections in this period is such, the number of database connections increased, the CPU idle rate of 0, then is not the problem is also very obvious.
3. The implementation of the process of optimization is mainly aimed at the problems identified earlier and take measures. It may be that the system maintainer adjusts the collection load to make the load more uniform, or adjusts the host or database parameters, but more likely the program needs to be optimized by 4. Evaluate optimization effects such as the second example above, after taking the optimization measures, whether the network connection number or the database host load, has been very smooth, and the problem is no longer present.  If not, repeat the above four steps to continue the optimization. My toolbox. This toolkit is my usual performance analysis tool that helped me solve a lot of performance optimization challenges a) Web Access latency monitoring tool assayfilter deployment after deployment on the main application can see the access time in Resin/logs/assayfilter.log, access to the user , access to the URL, latency milliseconds, source IP, so that we can quantify the user's perception, data 20150227094607 zengguojin/nms/res/devicelist_down.jsp 309356 219.159.77.116 Ms 20150209113913 zengguojin/nms/res/devicelist_down.jsp 383042 219.159.77.116 Ms from these two data we can know that this user has encountered many times when accessing this function. What satisfaction would he have to wait for more than 300 seconds?
      For this tool there may be doubts about whether or not the statistical latency is too high is the result of network latency. Here's an explanation of his working mechanism, such as:     assayfilter as an interceptor, which counts the length of time before the access request enters resin and before the answer leaves resin, and the duration of the Visit =resin processing time + Primary application to database network latency +oracle SQL execution Time     The main application to the database is on one switch, so the main application to database network latency can be negligible.                          ,         &NB Sp     [Figure 1]       So this tool completely avoids the effect of network latency on the time lag statistics, and gives us a full focus on the performance of the Web application itself. B) Host monitoring probe Wd_probe       Deployment on the host       This monitoring probe, in addition to being a host alarm notification, is a performance analysis tool that I rely on to record cpu,swap, disk IO, network performance at every point in time, The number of processes, the number of network connections, performance data, when the CPU exceeds the preset threshold, outputs the process snapshot of the system at that time for post-mortem analysis.       Performance data under the data/perf of the probe home directory, system process snapshot under data/tmp C) database extra long SQL Collection Tool       deployed on the main application, Can be executed in cron every minute       This program will continue to capture the execution time of more than 6 seconds of SQL, recorded into the/tmp/sql.csv file, running results are as follows:      from the initiator can be divided into two categories, Typically, the JDBC program originated from the app is the SQL that the user's foreground access executes, and this SQL execution time of more than 6 seconds is the sql.  we need to optimize.
[Email protected] (TNS v1-v3) 2015/3/27 12:37 Fuuk7dbrmsk47 2158 214293 Select Probeid,cfgfiledir,cfgfilename,to_char (lastgottime, ' Yyyymmddhh24miss '), Resid from Cfgfilelist
JDBC Thin [email protected] 2015/3/26 7:29 Db6b71unjzah1 9 5 INSERT into Rcheckresgroupres (Planid,groupid,resid,ipaddressa,resname,nodecode,nodename,nodefullcode,
System basic Problem Check 1. Host base failure problem is disk space idle for 0?     is swap utilization more than 40%?     Does CPU utilization exceed 85% for a long time?     Does the network continue to drop packets?     Does the utilization of work disk IO last 100%? The above situation usually means that the system has more serious problems and needs to be further looked up from the program or database for reasons. 2. The resin JVM checks the Web app's foreground program JSP and class are all running in the resin JVM, the JVM (Java Virtual machine) is similar to the Oracle database, JSP and class similar to SQL, can be regarded as a system software, So just to see the Java process is not, the front desk can not access is not enough. Just as there is no sqlplus,plsql we can not maintain the database, the same JVM has the corresponding maintenance tools, all under Java_home/bin a.     View the memory footprint of the JVM jstat-gcutil <pid> 3s 5 This command interval 3 seconds view JVM memory utilization, sampling 5 times, S0 S1 E O P YGC YGCT FGC FGCT GCT
  0.00  98.51  44.95  39.41  63.43      9    0.070      2    0.195    0.266      Permanent memory area utilization 63.43%, The Elden and old districts are 44.95% and 39.41%b respectively. View stack calls for JVM      jstack <pid>      This command calls the stack output from all threads in the current JVM. When the front-desk access is unresponsive, it is particularly useful to troubleshoot the source of the failure. The above Thread-831 is blocked by the 38f06e88 thread, and according to the call stack, can be accurately located to the execution of the program, for troubleshooting. C. Check the error log for memory overflow errors       logs in Resin/logs/error.log or resin/log/jvm-default.log      If there is Java.lang.OutOfMemoryError:PermGen space, the JVM's permanent memory area is low                 - xx:maxpermsize=256m     can be adjusted to 256m                in accordance with permanent memory area utilization Java.lang.OutOfMemoryError Description of JVM heap memory low                -xmx2048m-xms2048m   & nbsp Increase the heap memory to 2048g      If you add Xmx to 2G, the error will still occur, which may be a memory leak, requiring developers to troubleshoot the  3.   Database Check Oracle troubleshooting is complicated, I can only take two simpleA few examples. A)   system parameter level optimization      1) The SGA makes full use of system memory, and the SGA can match half of the system's memory. And I have encountered the host 64G memory, Sga_target set to 5g     2)  db_cache_size best in sga_target-3g, because our program many do not use binding variable, if not set db_ Cache_size, the growth of the SGA has been the trend of Share_pool occupancy, the data is less cached, the acquisition of data needs to be read from disk, so the overall performance will certainly decline.       3)  shared_servers set to 0, let the database run in proprietary mode instead of Shared server mode     Although system parameter tuning will bring some performance improvement overall, But there is still a limit to the performance of poor SQL or programming and ineffective indexing and out-of-date statistics. b)   Application optimization level      1) positioning issues sql          This SQL can list all sql,select that the database is currently executing distinct S.sid,s.serial#,s.blocking_session,p.spid Pid,to_char (s.logon_time, ' yyyy-mm-dd HH24:MI:SS ') Logontime,
substr (s.machine,1,15) smachine,substr (s.program,1,20) sprogram,q.sql_id,substr (q.sql_text,1,200) SQL from V$sql q,v $session s,v$process P
where Q.hash_value=s.sql_hash_value and Q.address=s.sql_address and p.addr=s.paddr based on two criteria we can quickly find a problem. SQL section        One is that a process executes SQL consumes a very high CPU, CPU utilization is obtained from the top command, the process ID is PID the second is that some kind of process executes a lot of SQL, a single CPU is not high, but the merger is very high.     For SQL, support personnel can be found to further determine whether developers need to be optimized. 2) If there is a session blocked by another session view the Blocking_session field of the previous SQL result, if the blocked process is locked by a session, you need to kill the session with ALTER system K          Ill session ' sid,serail# '; Encountered a few times the system is very slow, the view is the development or maintenance with PLSQL to lock a table, resulting in related sessions are blocked 3) the SQL involved in the table analysis, update its statistical performance optimization is very profound, many things I also study, I can only summarize some of the frequently occurring problem solving experiences for everyone to share. If there is any problem, want to be able to receive your feedback tool is temporarily not released, in choosing a publishing method.

Web application Performance Optimization Experience summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.