Server monitoring with Python
In the Linux server, everything is a file, that is, the server running information, in fact, can be obtained from some files; after Baidu, you will know that in the Linux system, there is a/proc virtual file system:
The Linux system provides an excellent way for administrators to change the kernel while the system is running, without rebooting the kernel system, which is achieved through the/proc virtual file system. The/proc file virtual system is a mechanism that the kernel and kernel modules use to send information to the process (so called "/proc"), a pseudo-file system that allows interaction with the kernel's internal data structures to obtain useful information about the process, in operation (on the Fly) to change the settings (by changing the kernel parameters). Unlike other file systems,/proc exists in memory and not on the hard disk. The information provided by the proc file system is as follows:
- Process information: Any process in the system that has a process ID with the same name in the proc subdirectory can find CmdLine, mem, root, Stat, STATM, and status. Some information is only visible to the superuser, such as the process root directory. Each process that contains existing process information alone has some specialized links available, and any process in the system has a separate self-link pointing to the process information, which is useful for getting command-line information from the process.
- System Information: If you need to know the entire system information can also be obtained from/proc/stat, including CPU usage, disk space, memory swap, interrupt, etc.
- CPU Information: Use/proc/cpuinfo file to obtain the current accurate information of CPU
- Payload Information:/PROC/LOADAVG file contains system payload information
- System memory Information: The/proc/meminfo file contains details of the system memory, which shows the amount of physical memory, the number of available swap spaces, and the amount of free memory, etc.
Description of the main file in the/proc directory
file or directory name |
Description |
Apm |
Advanced Power Management Information |
CmdLine |
This file gives the kernel-initiated command line |
CPUinfo |
CPU Information |
Devices |
Devices that can be used (block devices/character devices) |
Dma |
Displays the DMA channel currently in use |
Filesystems |
File system for Core configuration |
Ioports |
I/O port currently in use |
Interrupts |
Each line of this file has a reserved interrupt |
Kcore |
System Physical Memory Image |
Kmsg |
Core output messages that are sent to the log file |
Mdstat |
This file contains RAID device information controlled by the MD device driver. |
Loadavg |
Average System load Balancing |
Meminfo |
Memory usage information, including physical memory and swap memory |
Modules |
This file gives you the information to load the kernel module. |
Lsmod |
The program uses this information to display information about the module's name, size, and number of uses |
Net |
Network Protocol status information |
Partitions |
System-recognized partition table |
Pci |
PCI Device Information |
Scsi |
SCSI Device Information |
Self |
To view the symbolic connection of the/PROC program process directory |
Stat |
This file contains information on CPU utilization, disk, memory pages, memory swap, all |
Swaps |
Displays the usage of the swap partition |
Uptime |
This file gives the number of seconds since the last system bootstrap, and how many seconds are idle |
Version |
This file has only one line of content that describes the kernel version that is running. can be analyzed using standard programming methods to obtain the required system information |
To the above out of so many, is not looking dazzling, but do not panic, in fact, we do server monitoring, will only be used to a relatively small number of departments.
Server monitoring using the/proc file system
Above we know where the server information can be obtained from, then we are writing a script, read the file we want to obtain information, from which to get the server running data. Here are some data from the servers we will often need to monitor:
Read/proc/meminfo get memory information
The contents of the file are as follows
memtotal:1017544 KB
memfree:583304 KB
memavailable:756636 KB
buffers:42996 KB
cached:238820 KB
swapcached:0 KB
active:116092 KB
inactive:252004 KB
Active (anon): 11956 KB
Inactive (anon): 85136 KB
Active (file): 104136 KB
Inactive (file): 166868 KB
unevictable:0 KB
mlocked:0 KB
swaptotal:1044476 KB
swapfree:1044272 KB
Dirty:64 KB
writeback:0 KB
anonpages:86304 KB
mapped:48832 KB
shmem:10812 KB
slab:40648 KB
sreclaimable:29904 KB
sunreclaim:10744 KB
kernelstack:2048 KB
pagetables:8232 KB
nfs_unstable:0 KB
bounce:0 KB
writebacktmp:0 KB
commitlimit:1553248 KB
committed_as:681428 KB
vmalloctotal:34359738367 KB
vmallocused:5796 KB
vmallocchunk:34359727572 KB
hardwarecorrupted:0 KB
anonhugepages:32768 KB
hugepages_total:0
hugepages_free:0
hugepages_rsvd:0
hugepages_surp:0
hugepagesize:2048 KB
directmap4k:34752 KB
directmap2m:1013760 KB
Each field specifically what meaning oneself Baidu Bar, directly on the monitor code:
"" Memory Monitor "" "DefMemory_stat(): mem = {} f = open ('/proc/meminfo ',' r ') lines = F.readlines () f.close ()For lineIn lines:If Len (line) <2:Continue name = Line.split (‘:‘)[0] var = line.split (‘:‘)[1].split () [0] Mem[name] = float (Var) mem[' memused '] = mem[' Memtotal ']-mem[' Memfree ']-mem[' Buffers ']-mem[' Cached '] #记录内存使用率 has used total memory and cache size res = {} res[ ' percent '] = Int (Round (mem[< Span class= "hljs-string" > ' memused ')/Mem[ ' memtotal '] * 100)) Res[ ' used '] = round (Mem[ ' memused ')/( 1024 * 1024), 2) Res[ Memtotal '] = round (Mem[ ' memtotal ')/(1024 * 1024), 2) res[ ' buffers '] = round (mem[ ' buffers ']/(1024 * 1024), 2) return res
Read/PROC/LOADAVG get CPU load Information
The contents of the document are as follows:
0.00 0.01 0.05) 1/128 9424
Briefly explain the meaning of each field, the first three parameters are 1, 5, 15 minutes of the average CPU load, the fourth parameter is the number of running processes and the total number of processes, the last one represents the most recent active process ID
The following code is implemented by Python to monitor the CPU load:
"" CPU Load Monitoring "" "DefLoad_stat(): Loadavg = {} f = open ("/proc/loadavg") con = F.read (). Split () F.close () loadavg[' lavg_1 ']=con[0] Loada vg[' lavg_5 ']=con[1] loadavg[' lavg_15 ']=con[2] loadavg[' nr ']=con[3] prosess_list = loadavg[ ' nr '].split ('/') loadavg[' running_prosess ']=prosess_list[0] loadavg[' total_prosess ']= prosess_list[1] loadavg[' last_pid ']=con[4] return loadavg
Get hard disk information using Python's OS package
"" Disk space Monitoring ""DefDisk_stat():Import OS hd={} disk = Os.statvfs ('/') hd[' Available '] = float (disk.f_bsize * disk.f_bavail) hd[' capacity '] = float (disk.f_bsize * disk.f_blocks) hd[' Used ' = float ((disk.f_blocks-disk.f_bfree) * disk.f_frsize) res = {} res[' used '] = round (hd[' used ')/(1024x768 * 1024x768 ), 2) res[' capacity '] = round (hd[' capacity ']/(1024x768 * 1024x768 ), 2) res[' available ') = res[' capacity ']-res['used '] res[' Percent '] = Int (round (float (res[' used ")/res[' capacity '] *) ) return res
Get the IP of the server
On a server, there may be more than one network card, in obtaining the network card information, you need to pass in the name of the network card, specific network cards, you can use the ifconfig
command to view
""" 获取当前服务器ip"""def get_ip(ifname): import socket import fcntl import struct s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) return socket.inet_ntoa(fcntl.ioctl(s.fileno(), 0x8915, struct.pack(‘256s‘, ifname[:15]))[20:24])
Read/proc/net/dev get network card traffic information
We will get the system's network interface from this file, and the information that sends and receives data through their data after the system restarts. The/proc/net/dev file makes this information available. If you check the contents of this file, you will notice that the head one or two line contains header information and so on, the first column of this file is the network interface name, the second and third columns show the number of bytes received and sent information (such as the total number of bytes sent, the number of packets, errors, etc.). What we're interested in here is that he's sad. Different network devices extract the total sent data and receive data. The following code shows how to extract this information from the/proc/net/dev file, and the contents of the file are as follows:
inter-| Receive | Transmit
Face |bytes packets errs drop FIFO frame compressed multicast|bytes packets errs Drop FIFO colls carrier compressed
lo:13092608592182 4315193859 0 0 0 0 0 0 13092608592182 4315193859 0 0 0 0 0 0
eth0:6081251983019 4697841969 0 0 0 0 0 0 196939978179 2079619999 0 0 0 0 0 0
eth1:5718927608592 9484371630 0 0 0 0 0 0 142737118022 2007173284 0 0 0 0 0 0
The incoming and outgoing traffic information for each NIC is obtained below:
#!/usr/bin/env pythonFrom __future__ import Print_functiondef net_stat (): NET = {} F =Open"/proc/net/dev")lines = F.readlines () F.CloseForLineInchlines[2:]:line =Line.Split":") Eth_name =line[0].strip ()If eth_name! =' lo ': net_io = {} net_io[' receive '] = round (float (line[1). Split () [0])/(1024.0 * 1024.0),2) net_io[' transmit '] = round (float (line[1]. Split () [8])/(1024.0 * 1024.0),2) net[eth_name] = Net_io return netif __name__ = = ' __m Ain__ ': Netdevs = Net_stat () print (Netdevs)
Finally, a monitoring script is provided for an Apache service
#!/usr/bin/env Python Import OS, sys, time while True: Time. sleep (4) Try:ret = Os.popen ( ' ps-c Apache-o pid,cmd '). ReadLines () if Len (ret) < 2: print " Apache Process exited abnormally, restarted after 4 seconds "time.sleep (3) os. System ( "service apache2 restart") except: print " Error ", Sys.exc_info () [1]
Server monitoring with Python