First, Topas
1, Sub-command:
A----> Return to the original main screen
C---->CPU Zone State switching (CPU)
D----> Disk area state Switching
n----> Network zone State Switching (net)
P----> Process zone state switch (processor)
P----> Full screen display process status (uppercase P)
Q----> Exit (Quit)
2, monitoring screen analysis
A) parameters of theCPU area
Kernel:cpu the usage ratio of the program used to perform kernel mode
User:cpu the usage ratio of the program used to perform user mode
Wait:cpu waiting for I/O occupation ratio
Percentage of IDLE:CPU in idle state
b) Event/Queue area parameters (both display a value during a monitoring period, and a value that is displayed during another monitoring period after the refresh)
Cswitch: Number of context switches per second
Systcall: Number of system calls executed per second
Reads: Number of Read calls executed per second
Writes: Number of write calls executed per second
Forks: Number of fork calls executed per second
Execs: Number of exec calls executed per second
Runqueue: Average number of threads ready to run
Waitqueue: Average number of threads waiting for page scheduling to complete
c) File/terminal area parameters (both display a value during a monitoring period, and a value that is displayed during another monitoring period after the refresh)
READCH: The number of bytes read per second through the read call
Writech: The number of bytes written per second through write calls
Rawin: Number of raw bytes read through TTY per second
Ttyout: Number of bytes written to TTY per second
Igets: I-node lookup functions are called several times per second
Namei: Number of calls to the path lookup function per second
DIRBLK: Number of directory blocks scanned by the directory lookup function per second
d) The parameters of the page dispatch area (all display a value during the period of monitoring, and the refresh shows the value of the other period of monitoring)
Faults: The total number of errors that occur without face
Steals: Number of physical memory frames lost per second by the virtual memory manager
Pgspin: Number of pages per second read from page space, in 4KB
Pgspout: Number of pages written to page space per second, in 4KB
Pagein: Number of pages read per second, including the number of pages read in from the file system
Pageout: Number of pages written per second, including the number of pages written to the filesystem
Sios: I/O requests made by the virtual Memory manager per second
e) memory area parameters
REAL,MB: Actual memory size
Comp: Percentage of actual memory allocated to calculated page frames
Noncomp: Percentage of actual memory allocated to non-computed page frames
Client: Percent of actual memory allocated to the cache of Remote installation files
f) Parameters of the page space area
SIZE,MB: Page Space size
Used: Percentage of all page space used in the system
Free: Percentage of all Page space idle in the system
g) The parameters of the network interface area (all of which show a value during the monitoring period, and a value that is displayed during another monitoring period after the refresh)
Interf: Network Interface Name
KBPS: Throughput per second
I-pack: Number of packets received per second
O-pack: Number of packets sent per second
Kb-in: Number of kilobytes per second received
Kb-out: Number of kilobytes sent per second
h) Parameters of the Physical disk area
Disk: Physical Disk name
busy%: Percentage of physical disks performing read and write operations
KBPS: Total of Read and write kilobytes of data per second
TPS: The number of transfers to a physical disk per second, one transfer being a disk I/O, and multiple logical requests composing a disk I/O
Kb-read: Number of kilobytes of data read from physical disk per second
Kb-write: The number of kilobytes per second written to the physical disk
i) parameters of the process area
Name: process-Corresponding program name
PID: Process Number
CPU%: Percentage of CPU occupied
PGSP: The size of the page space allocated to the process
Class: "What is this?" "
second, the SAR
SAR mainly collects, displays and preserves the activity information of the system, including CPU efficiency, memory usage, system call, file reading and writing, process activity, IPC related activities and so on. The SAR command essentially calls the SADC command, and when the SAR command is run, a/USR/LIB/SA/SADC process is run in the background, and the SAR converts the data generated by the SADC command into a text format, which is then displayed or saved to a file.
There are two ways to run the SAR command, one is to add the interval and number parameters behind the SAR, to get statistics in real time, and the second way is to run the SAR directly without any parameters, then the default SAR will go to the/var/adm/sa/sadd file to analyze the data. DD is the number of days, such as today's 19th, then is/var/adm/sa/sa19, but this file does not automatically exist, the default is blocked by #, if you want to actively generate, then you need to open, in the/ETC/RC file has this:
The day data is generated when the system starts. This/ETC/RC script will be executed at the beginning of the system, this one in the/etc/inittab file
Directly without parameters run Sar,sar first to find/var/adm/sa/sadd, the second to find/USR/LIB/SA/SA1, but this script is executed by cron, if Cron does not execute/USR/LIB/SA/SA1 This script every day, Then you'll be prompted to run the SAR.
1. Analyze CPU Activity
The concept of the four parameters is the same inside the Topas.
%usr+%sys+%wio+%idle=100
When the%usr+%sys is close to 100%, then the CPU to the limit, you can consider increasing the number of CPUs,%usr significantly larger than%sys, proving that user applications occupy too much CPU, you can consider optimization. %wio takes too much time to prove that the CPU spends too long waiting for disk I/O, and if there is not enough memory, it can cause the page to be dispatched too frequently, which also results in a lot of disk I/O.
2. Read/write Operations for statistical files (access System)
IGET/S: Number of calls to Inode lookup function per second
LOOGUPPN/S: Number of calls to the directory lookup function per second
DIRBLK/S: The number of times a block is read by a directory lookup function
3, Statistical system call (calls)
SCALL/S: Total system calls per second
SREAD/S: Total Read calls per second
SWRIT/S: Total Write calls per second
FORK/S: Total number of fork calls per second
EXEC/S: Total exec calls per second
RCHAR/S: The number of bytes transmitted by the read () call system per second
wchar/s: Number of bytes transferred per second by write () call
4. Statistical block equipment Activity (device)
Device: Block device name
%busy: Percentage of devices busy processing transfer requests
Avque: The device does not have an average of the number of completed requests during the processing of the transfer request
R+W/S: Number of Reads/writes per second to the device
BLKS/S: Number of blocks transferred per second
Avwait: Average number of delivery request waiting Queue idle
Avserv: The average time required to complete the transfer request
If%busy>50% or AVAWAIT>AVSCRV, then there may be disk I/O bottlenecks
5, Statistics queue activity (queue statistics)
Runq-sz: Average number of kernel threads in the run queue
%RUNOCC: Percentage of elapsed time to run a queue
Swpq-sz: The average number of kernel threads waiting for the page to be paged in, that is, the swap queue size
%SWPOCC: Percentage of swap queue elapsed time
Runq-sz<4 or swpq-sz<5 is a better situation.
6. Scheduling of statistical pages (paging statistics)
Slots: The number of free pages that are located on the page space
CYCLE/S: The number of page exchange cycles that occur per second
FAULT/S: The number of page faults that occur per second
ODIO/S: Number of non-disk I/O page schedules that occur per second
7, the application of statistical system report
#sar-V 2 6
8. Activity of the statistical TTY devices (TTY device)
9. Use of statistical buffers (buffer)
10, the activity of the statistical kernel process
#sar-K 2 6
11. Activity of statistical messages and semaphores
#sar-M 2 6
12. Activities to be exchanged for statistics
#sar-W 2 6
Third, Vmstat
Primarily reports on the activity of virtual memory, but he also counts kernel threads, physical disks, traps (errors), and CPU activity.
1. Memory
AVM: Refers to the number of active virtual memory pages, that is, the total number of virtual memory pages allocated in the page space, and if this value is higher, that does not mean that the system is performing poorly.
fre: Refers to the number of free memory pages in RAM. The system maintains a memory page buffer called the Free list, which allocates space for VMM through the free list when the Virtual Memory Manager (VMM) requires space.
Description
1) dividing the AVM by 256 is the size of the allocated page space (in megabytes) within the system scope
2) #lsps-a display information for each page space
3) The recommended bit system allocates enough page space so that its usage rate does not reach 100%
4) When there are fewer than 128 unallocated virtual pages on the page space, some processes are killed to free up some page space.
5) The minimum number of pages that VMM maintains in the free list is determined by the Minfree parameter, which can be modified with the Vmtune command, but only if the set of bos.adt.samples files needs to be installed (vmtune This command is old and now VMO )
2. Page
re: (parameter deactivated)
Pi: Number of pages per second in page space
po: Paged page space per second
fr: Free pages
SR: Number of pages checked by page conversion method
cy: Clock cycle per second
Description
1) If the value of Pi and Po is not always 0, it proves that the page scheduling activity is too frequent, which greatly reduces the performance of the system, which is mainly due to the bottleneck of memory.
2) If the ratio of PI:PO is greater than or equal to 1, it proves that for every page call, there will be at least one page recall, so the system's page scheduling activity is very frequent, with a high page scheduling rate
3) If the FR:SR ratio is too high, it proves that memory usage is excessive, and if FR:SR is 1:4 it means that each page is freed, and 4 pages need to be checked.
3, Faults
In : number of device interrupts
Sy: Number of system calls
CS: The number of context switches for kernel threads
4. CPU
r: number of threads added to the running queue per second
B: the number of threads per second that are added to the waiting queue due to waiting for resources or I/O
US: Percentage of time that the CPU is in user mode
Sy: Percentage of time that the CPU is in system mode
ID: Percentage of CPU in idle mode
wa: Percentage of time that the CPU is idle while waiting for disk I/O mode
Description
1) ID has always been 0, indicating that the CPU has been in a busy state
2) The running queue increases, the user response time will increase, if R is not 0, prove that the CPU has more work to do
3) The ratio of CPU to user and system is close to 100%, which may indicate CPU limit
4) If the WA value exceeds 40%, the disk subsystem is unbalanced and may be caused by a disk-intensive workload.
Iv. Iostat
It reports statistics on CPU, terminal I/O, and disk I/O that help define I/O load for a single component, such as a hard disk.
1, TTY and CPU statistics (if using the-t parameter, then only the TTY and CPU statistics are reported)
Tin: number of characters received per second from the terminal
tout: number of characters sent to terminal per second
%user: percentage of CPU consumed by the execution user program
%sys: percentage of CPU consumed by executing core programs
%idle: Percentage of CPU in idle mode
%iowait: Percentage of time that the CPU is idle while waiting for disk I/O mode
%iowait is relatively high, proving that disk I/O has bottlenecks.
There are a few common ways to solve bottlenecks:
1) Do not place multiple active logical volumes and file systems on the same hard disk, so that I/O liabilities are evenly distributed across multiple physical disks
2) A logical volume is distributed on multiple physical disks to provide concurrent access
3) Create multiple JFS logs (log) in a volume group and assign them to a specific file system, which is useful for creating, deleting, or modifying a large number of files, especially temporary files. (Create multiple JFS logs, or JFS2 logs, remember to format with Logform)
4) Reduce fragmentation by backing up and recovering files, resulting in increased response time due to frequent drive position heads
5) Add some drives to balance the existing I/O subsystem
2. Statistical report of disk I/O
Disks: Displays the name of the physical volume
%tm_act: The percentage of time that the physical disk is active. A drive is active only when the data is transferred or handled by the command.
%kbps: How many kilobytes of data the drive transmits per second
TPS: number of transfers sent to physical disk per second
Kb_read: refers to the amount of data read from the physical volume during the reporting interval, in kilobytes
Kb_wrtn: refers to the amount of data written to the physical volume during the reporting period, in kilobytes
Description
1) The percentage of disk usage%tm_act is proportional to resource contention, inversely related to I/O performance, and when disk usage increases, I/O performance decreases and response time increases.
2) The more drives you have in the system, the better I/O performance
3) Find a busy drive (compared to an idle drive) and move data from a busy disk to a more busy disk, which can reduce disk bottlenecks
4) Check the page scheduling activity, because paging in/out also increases the I/O load, moving the page space from a busy disk to a non-busy disk.
Wu, Vmo
See other separate cloud notes
Six, network performance analysis
Description
When an application wants to send data to an application on a remote host, it first writes the data to a local socket, that is, the data is copied from the application's cache to the local socket send buffer.
Once the data is sent to the socket buffer, a mbuf or cluster list is formed, and then the socket is sent by TCP or UDP, the data can be very large, exceeding the MTU limit, then For the data sent by the TCP protocol, the data is divided into segments (Segment) at the Transport Layer (TCP/UDP layer), the segment is the data unit of the TCP protocol, and for the UDP protocol, the partition of the data is given to the network layer (IP layer) to complete .
Similarly, at the receiving end, there is a receiving queue that works against the sending method. Note the sending buffer and the receive buffer size are subject to udp_sendspace and tcp_sendspace(or udp_recvspace and tcp_recvspace ) is limited.
After the data arrives at the host, it is placed in the device driver layer and then routed through the interface layer to the IP input queue of the IPS layer, the size of the IP input queue is controlled by the ipqmaxlen parameter, and the no command can be changed by this parameter. If a packet is lost or broken in transit, the time the IP layer waits to accept the missing packet is controlled by the ipfragttl parameter.
Aix allocates virtual memory for various TCP/IP network tasks, and the memory management device used by the network subsystem, called MBUF,MBUF, is primarily used to store data received from the network and sent to the network. When Aix is running, you can configure the Mbuf pool. The system parameters listed below can be adjusted:
1) thewall: Kernel variables, setting the maximum number of MBUF management devices that can be allocated from RAM
2) Tcp_sendspace: Sets the default socket send buffer size. Default 16384
3) Udp_sendspace: Sets the maximum amount of memory used by a single UDP socket to set the buffer that emits the data. Default 9216
4) Udp_secvspace: Sets the maximum number of buffers that accept any single UDP socket. Default is 41920
5) rfc1323: This value is not 0, allowing the TCP window size to replace 16 bits with the maximum number of 32 bits (bit), both Tcp_sendspace and Tcp_secvspace can be set to 64KB
6) Sb_max: Control the upper limit of all buffer size points
7) Ipqmaxlen: Kernel variable, control the length of the IP input queue, the default is the length of 100 packets. This value is sufficient for a single network device, and if the IP input queue is not long enough, the packet is discarded
—————— These parameters, you can use the no command to modify
20th Chapter System Performance Tuning