Server monitoring with Python

Source: Internet
Author: User

Server monitoring with Python

In the Linux server, everything is a file, that is, the server running information, in fact, can be obtained from some files; after Baidu, you will know that in the Linux system, there is a/proc virtual file system:

The Linux system provides an excellent way for administrators to change the kernel while the system is running, without rebooting the kernel system, which is achieved through the/proc virtual file system. The/proc file virtual system is a mechanism that the kernel and kernel modules use to send information to the process (so called "/proc"), a pseudo-file system that allows interaction with the kernel's internal data structures to obtain useful information about the process, in operation (on the Fly) to change the settings (by changing the kernel parameters). Unlike other file systems,/proc exists in memory and not on the hard disk. The information provided by the proc file system is as follows:

    1. Process information: Any process in the system that has a process ID with the same name in the proc subdirectory can find CmdLine, mem, root, Stat, STATM, and status. Some information is only visible to the superuser, such as the process root directory. Each process that contains existing process information alone has some specialized links available, and any process in the system has a separate self-link pointing to the process information, which is useful for getting command-line information from the process.
    2. System Information: If you need to know the entire system information can also be obtained from/proc/stat, including CPU usage, disk space, memory swap, interrupt, etc.
    3. CPU Information: Use/proc/cpuinfo file to obtain the current accurate information of CPU
    4. Payload Information:/PROC/LOADAVG file contains system payload information
    5. System memory Information: The/proc/meminfo file contains details of the system memory, which shows the amount of physical memory, the number of available swap spaces, and the amount of free memory, etc.
Description of the main file in the/proc directory
file or directory name Description
Apm Advanced Power Management Information
CmdLine This file gives the kernel-initiated command line
CPUinfo CPU Information
Devices Devices that can be used (block devices/character devices)
Dma Displays the DMA channel currently in use
Filesystems File system for Core configuration
Ioports I/O port currently in use
Interrupts Each line of this file has a reserved interrupt
Kcore System Physical Memory Image
Kmsg Core output messages that are sent to the log file
Mdstat This file contains RAID device information controlled by the MD device driver.
Loadavg Average System load Balancing
Meminfo Memory usage information, including physical memory and swap memory
Modules This file gives you the information to load the kernel module.
Lsmod The program uses this information to display information about the module's name, size, and number of uses
Net Network Protocol status information
Partitions System-recognized partition table
Pci PCI Device Information
Scsi SCSI Device Information
Self To view the symbolic connection of the/PROC program process directory
Stat This file contains information on CPU utilization, disk, memory pages, memory swap, all
Swaps Displays the usage of the swap partition
Uptime This file gives the number of seconds since the last system bootstrap, and how many seconds are idle
Version This file has only one line of content that describes the kernel version that is running. can be analyzed using standard programming methods to obtain the required system information

To the above out of so many, is not looking dazzling, but do not panic, in fact, we do server monitoring, will only be used to a relatively small number of departments.

Server monitoring using the/proc file system

Above we know where the server information can be obtained from, then we are writing a script, read the file we want to obtain information, from which to get the server running data. Here are some data from the servers we will often need to monitor:

Read/proc/meminfo get memory information

The contents of the file are as follows

memtotal:1017544 KB
memfree:583304 KB
memavailable:756636 KB
buffers:42996 KB
cached:238820 KB
swapcached:0 KB
active:116092 KB
inactive:252004 KB
Active (anon): 11956 KB
Inactive (anon): 85136 KB
Active (file): 104136 KB
Inactive (file): 166868 KB
unevictable:0 KB
mlocked:0 KB
swaptotal:1044476 KB
swapfree:1044272 KB
Dirty:64 KB
writeback:0 KB
anonpages:86304 KB
mapped:48832 KB
shmem:10812 KB
slab:40648 KB
sreclaimable:29904 KB
sunreclaim:10744 KB
kernelstack:2048 KB
pagetables:8232 KB
nfs_unstable:0 KB
bounce:0 KB
writebacktmp:0 KB
commitlimit:1553248 KB
committed_as:681428 KB
vmalloctotal:34359738367 KB
vmallocused:5796 KB
vmallocchunk:34359727572 KB
hardwarecorrupted:0 KB
anonhugepages:32768 KB
hugepages_total:0
hugepages_free:0
hugepages_rsvd:0
hugepages_surp:0
hugepagesize:2048 KB
directmap4k:34752 KB
directmap2m:1013760 KB

Each field specifically what meaning oneself Baidu Bar, directly on the monitor code:

"" Memory Monitor "" "DefMemory_stat(): mem = {} f = open ('/proc/meminfo ',' r ') lines = F.readlines () f.close ()For lineIn lines:If Len (line) <2:Continue name = Line.split (‘:‘)[0] var = line.split (‘:‘)[1].split () [0] Mem[name] = float (Var) mem[' memused '] = mem[' Memtotal ']-mem[' Memfree ']-mem[' Buffers ']-mem[' Cached '] #记录内存使用率 has used total memory and cache size res = {} res[ ' percent '] = Int (Round (mem[< Span class= "hljs-string" > ' memused ')/Mem[ ' memtotal '] * 100)) Res[ ' used '] = round (Mem[ ' memused ')/( 1024 * 1024), 2) Res[ Memtotal '] = round (Mem[ ' memtotal ')/(1024 * 1024), 2) res[ ' buffers '] = round (mem[ ' buffers ']/(1024 * 1024), 2) return res         

Read/PROC/LOADAVG get CPU load Information

The contents of the document are as follows:

0.00 0.01 0.05) 1/128 9424
Briefly explain the meaning of each field, the first three parameters are 1, 5, 15 minutes of the average CPU load, the fourth parameter is the number of running processes and the total number of processes, the last one represents the most recent active process ID

The following code is implemented by Python to monitor the CPU load:

"" CPU Load Monitoring "" "DefLoad_stat(): Loadavg = {} f = open ("/proc/loadavg") con = F.read (). Split () F.close () loadavg[' lavg_1 ']=con[0] Loada vg[' lavg_5 ']=con[1] loadavg[' lavg_15 ']=con[2] loadavg[' nr ']=con[3] prosess_list = loadavg[ ' nr '].split ('/') loadavg[' running_prosess ']=prosess_list[0] loadavg[' total_prosess ']= prosess_list[1] loadavg[' last_pid ']=con[4] return loadavg      

Get hard disk information using Python's OS package
"" Disk space Monitoring ""DefDisk_stat():Import OS hd={} disk = Os.statvfs ('/') hd[' Available '] = float (disk.f_bsize * disk.f_bavail) hd[' capacity '] = float (disk.f_bsize * disk.f_blocks) hd[' Used ' = float ((disk.f_blocks-disk.f_bfree) * disk.f_frsize) res = {} res[' used '] = round (hd[' used ')/(1024x768 * 1024x768  ), 2) res[' capacity '] = round (hd[' capacity ']/(1024x768 * 1024x768  ), 2) res[' available ') = res[' capacity ']-res['used '] res[' Percent '] = Int (round (float (res[' used ")/res[' capacity '] *) ) return res   

Get the IP of the server

On a server, there may be more than one network card, in obtaining the network card information, you need to pass in the name of the network card, specific network cards, you can use the ifconfig command to view

""" 获取当前服务器ip"""def get_ip(ifname): import socket import fcntl import struct s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) return socket.inet_ntoa(fcntl.ioctl(s.fileno(), 0x8915, struct.pack(‘256s‘, ifname[:15]))[20:24])

Read/proc/net/dev get network card traffic information

We will get the system's network interface from this file, and the information that sends and receives data through their data after the system restarts. The/proc/net/dev file makes this information available. If you check the contents of this file, you will notice that the head one or two line contains header information and so on, the first column of this file is the network interface name, the second and third columns show the number of bytes received and sent information (such as the total number of bytes sent, the number of packets, errors, etc.). What we're interested in here is that he's sad. Different network devices extract the total sent data and receive data. The following code shows how to extract this information from the/proc/net/dev file, and the contents of the file are as follows:

inter-| Receive | Transmit
Face |bytes packets errs drop FIFO frame compressed multicast|bytes packets errs Drop FIFO colls carrier compressed
lo:13092608592182 4315193859 0 0 0 0 0 0 13092608592182 4315193859 0 0 0 0 0 0
eth0:6081251983019 4697841969 0 0 0 0 0 0 196939978179 2079619999 0 0 0 0 0 0
eth1:5718927608592 9484371630 0 0 0 0 0 0 142737118022 2007173284 0 0 0 0 0 0

The incoming and outgoing traffic information for each NIC is obtained below:

#!/usr/bin/env pythonFrom __future__ import Print_functiondef net_stat (): NET = {} F =Open"/proc/net/dev")lines = F.readlines () F.CloseForLineInchlines[2:]:line =Line.Split":") Eth_name =line[0].strip ()If eth_name! =' lo ': net_io = {} net_io[' receive '] = round (float (line[1).  Split () [0])/(1024.0 * 1024.0),2) net_io[' transmit '] = round (float (line[1].  Split () [8])/(1024.0 * 1024.0),2) net[eth_name] = Net_io return netif __name__ = = ' __m Ain__ ': Netdevs = Net_stat () print (Netdevs)            

Finally, a monitoring script is provided for an Apache service
#!/usr/bin/env Python Import OS, sys, time while True:  Time. sleep (4) Try:ret = Os.popen ( ' ps-c Apache-o pid,cmd '). ReadLines () if Len (ret) <  2: print " Apache Process exited abnormally, restarted after 4 seconds "time.sleep (3) os. System ( "service apache2 restart") except: print " Error ", Sys.exc_info () [1]       

Server monitoring with Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.