This performance tuning project because of the many involved, it is best to be able to monitor all the content in the build environment, while considering the lowest cost, so from the application server and database server two servers, to Nmon as the monitoring base data, while monitoring JVM and database alarms and snapshots.
All the monitoring content is the means, only from the massive monitoring log to get regular, meaningful data is the basis for performance optimization. With the data is the analysis of the data, this article will first introduce the need to obtain the data, the content will be my experience from the project.
Basic Environment:
两台数据库服务器,做的数据库集群。
Application Server-JVM thread
The main use of Tongweb in the project (old system version is very low), monitoring content similar to the following:
Monitoring content
... "2018-01-11t02:25:55.663+0800", "Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor, Server=server "," numconncreated "," Ten "," 2018-01-11t02:25:55.663+0800 "," com.tongtech.tongweb:name=***,type= Jdbc-connection-pool,category=monitor,server=server "," numconnacquired "," 111292 "," 2018-01-11t02:25:55.663+0800 " , "Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server", " Numconnnotsuccessfullymatched "," 0 "," 2018-01-11t02:26:25.670+0800 "," com.tongtech.tongweb:type=jvm,category= Monitor,server=server "," UpTime "," 222520621 "," 2018-01-11t02:26:25.670+0800 "," COM.TONGTECH.TONGWEB:TYPE=JVM, Category=monitor,server=server "," HeapSize "," 2143485952 "," 2018-01-11t02:26:25.671+0800 "," Com.tongtech.tongweb: Name=***,type=jdbc-connection-pool,category=monitor,server=server "," numconnused "," 0 "," 2018-01-11T02:26:25.671+ 0800 "," Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server "," Numconnsuccessfullymatched "," 0 "," 2018-01-11t02:26:25.671+0800 "," Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server "," WaitQueueLength ", "0", "2018-01-11t02:26:25.671+0800", "Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor, Server=server "," numconndestroyed "," 0 "," 2018-01-11t02:26:25.671+0800 "," com.tongtech.tongweb:name=***,type= Jdbc-connection-pool,category=monitor,server=server "," Connrequestwaittime "," 4 "," 2018-01-11t02:26:25.672+0800 ", "Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server", " Numconnfailedvalidation "," 0 "," 2018-01-11t02:26:25.672+0800 "," com.tongtech.tongweb:name=***,type= Jdbc-connection-pool,category=monitor,server=server "," numconnreleased "," 111292 "," 2018-01-11t02:26:25.672+0800 " , "Com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server", "NumConnFree", "10", ...
Focus on content
Tongweb monitoring data to obtain connection pool status information, our method is through the Excel macro to convert the log into readable data, and graphical analysis. Specific content will be described separately.
JVM Thread Monitoring Instructions
Monitoring significance
By monitoring the JVM of the tongweb, it is possible to determine the peak time point of performance, whether the connection pool is full, and further determine whether the performance bottleneck of peak connection is present in the application, which is very important for future performance analysis, which can classify the main performance problems and reduce unnecessary work.
Application Server-netstat
In the Internet RFC standard, netstat is defined as: Netstat is a program that accesses the state of the network connection and its associated information in the kernel, and it provides reports on TCP connections, TCP and UDP snooping, and process memory management.
Monitoring content
The following is a log fetch in the project
... Active Internet connections (servers and established) Proto recv-q send-q Local address Foreign address S Tate TCP 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:139 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:427 0.0.0.0:* LISTEN TCP 0 0 127.0.0.1:427 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:58862 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:111 0.0.0.0:* LISTEN TC P 0 0 0.0.0.0:2544 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:21 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:631 0.0.0.0:* LISTEN TCP 0 0 127.0.0.1:25 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:445 0.0. 0.0:* LIsten TCP 0 0 0.0.0.0:669 0.0.0.0:* LISTEN ...
Application Server-Nmon
As the main analysis means of the performance optimization, Nmon plays a particularly important role, the following is the interpretation of the wiki, there is time to understand
Nmon collects the following operating system statistics:
CPU and CPU Threads utilisation
CPU frequency for servers or virtual machines the can alter their clock rate
GPU stats including utilisation, MHz and temperatures
Physical and Virtual Memory use
Disk Read & Write and transfers
Disk Groups decided by the user
Swap and Paging
Network Read & Write and transfers
Local File-systems
Network File-system (NFS)
Top Processes by CPU use, Memory size and I/O rates
Kernel stats including Run Queue, Context-switch, fork, Load Average & Uptime
Large and Huge Memory pages
Virtual machine stats (depending in the hardware)-useful for Linux running KVM to host virtual machines
Resources in the Server and virtual machine
In fact, Nmon is more like a snapshot of the system performance overhead, combined with the Nmon analysis tool can clearly grasp the system's indicators.
Download analysis Tools
Database Server-Alarms
Understanding the database's alarm logs is also a key part of mastering current performance.
The log is as follows, such as error can be analyzed to solve the specific situation.
2018-01-11-00.36.36.090562+480 I13363168A459 LEVEL: ErrorPID : 2228842 TID : 142490 PROC : db2syscINSTANCE: db2 NODE : 000 DB : TRADEEDUID : 142490 EDUNAME: db2agent (**) 0FUNCTION: DB2 UDB, Query Gateway, sqlqg_fedstp_hook, probe:40MESSAGE : Unexpected error returned from outer RC=DATA #1 : Hexdump, 4 bytes0x07000007053F28D0 : 8126 0012 .&..
Database Server-Snapshot
The database log snapshot will be used as the primary analysis, in which case the cost of database time can be analyzed in the snapshot as follows:
...Number of automatic storage paths = 1Automatic storage path = /db2data Node number = 0 State = In Use File system ID = 9223372079804448776 Storage path free space (bytes) = 69730709504 File system used space (bytes) = 139648946176 File system total space (bytes) = 209379655680...
This article only lists the method of analysis, the specific operation time I will slowly summarize.
The use of tools is important, but performance tuning is not the only way to 步步为营
prepare for long-term warfare.
DB2 Tuning (ii) resource monitoring