Comprehensive monitoring of Linux server running status

Source: Internet
Author: User
Tags mrtg
Linux server running status comprehensive monitoring-Linux Enterprise Application-Linux server application information, the following is a detailed description. With the increasing popularity of Linux applications, a large number of network servers use the Linux operating system. To fully measure the network running status, you need to make more detailed and accurate measurements on the network status. The establishment of the SNMP protocol provides powerful support for Internet measurement. The computer system is

The detection of hardware status is very important to ensure the stability of the entire system. Whether the operating system is Linux or Windows, once the hardware fails, the overall system security will be serious. Here we mainly monitor the operating status of the CPU, hard disk, memory, network interface, motherboard, and other hardware of the Linux server.

I. Features of/proc file system
The Linux system provides administrators with an excellent way to change the kernel while the system is running, without the need to reboot the kernel system. This is achieved through the/proc Virtual File System. The/proc file virtual system is a mechanism used by the kernel and kernel module to send information to the process (so it is called/proc ). This pseudo file system allows you to interact with the internal data structure of the kernel to obtain useful information about the process. on the fly, you can change the settings (by changing the kernel parameters ). Unlike other file systems,/proc exists in the memory rather than on the hard disk. You can view the system information without restarting CMOS. This is one of the highlights of/proc. The main file content in the/proc directory is shown in table-1:

File or directory name

Description

Apm

Advanced Power Management Information

Cmdline

Kernel command line

Cpuinfo

Information about the central processor

Devices

Available devices (Block devices/character devices)

Dma

Display the currently used DMA Channel

Filesystems

Core Configuration File System

Ioports

Current I/O port

Interrupts

Display interruption of use

Kcore

System physical memory image

Kmsg

Core output messages are sent to syslog files.

Ksyms

Core symbol table

Loadavg

Average System Load Balancing

Meminfo

Memory usage information, including physical memory and swap memory

Modules

Which core modules are currently loaded.

Net

Network Protocol status information

Partitions

Partition Table recognized by the System

Pci

Pci device information

Scsi

Scsi device information

Self

Symbolic connection to the process directory of the program to view/proc

Stat

Comprehensive statistics Status table

Swaps

Swap partition information

Uptime

System Startup duration

Version

Core version number

For each Linux system, the content of the/proc Virtual File System varies depending on the hardware and software. The/proc virtual file system has three important directories: net, scsi, and sys. The Sys directory is writable and can be used to access or modify kernel parameters. net and scsi depend on Kernel configuration. For example, if the system does not support scsi, the scsi directory does not exist. In addition to the above descriptions, there are also some directories named by numbers, which are process directories. The. net directory contains multiple Network pseudo files in ASCII format, which describe part of the network layer. You can use commands such as arp, netstat, and route to query these files. In addition to the above descriptions, there are also some directories named by numbers, which are process directories. Each process currently running in the system has a corresponding directory under/proc, with the process PID as the directory name. They are interfaces for reading process information. The self directory is an interface for reading information about the process itself and a link. The name of the Proc file system starts from.

Ii. Five Main functions implemented by the proc file system:
1. Process Information: For any process in the system, there is a process ID with the same name in the proc subdirectory. You can find the following information: Restore Line, mem, root, stat, statm, and status. Some information is only visible to super users, such as the process root directory. There are some available dedicated links to each process that contains existing process information. For any process in the system, there is a separate self-link pointing to the process information. It is used to obtain command line information from a process.

2. System Information: If you need to know the overall system information, you can also obtain it from/proc/stat. It includes: CPU usage, disk space, Memory Page, memory swap, all interruptions, touch switch, and the last System Auto-lifting time.

3. CPU information: using the/proc/cpuinfo file, you can obtain the current accurate information of the central processor.

4. Load information: The/proc/loadavg file contains the system load information.

5. system memory information: The meminfo file contains detailed information about the system memory. It displays the number of physical memory, the number of available swap space, the number of idle memory, and so on.

1. Monitor the overall statistics of servers
To monitor the overall statistical status, run the following command:

# Cat/proc/stat




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Comprehensive System statistics


The preceding numbers represent:
Number of context switches; Total interruptions; Total number of incoming pages; Total number of outgoing pages; Total number of processes;
Total number of SWAPs; total CPU idle time; total CPU nice time; total CPU system time;
Total CPU user time.
For each CPU:
Idle time of a single CPU; nice time of a single CPU; system time of a single CPU; user time of a single CPU.
And the following data for each disk drive:
Single Disk block read; single disk block write; single disk I/O total; single disk I/O read; single disk I/O write.

2. Monitor network traffic
For more information about network traffic, run the following command.
# Cat/proc/net/dev



(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // imgs.ccw.com.cn/resources/2005_11/2005_11_05/200511051081133751304774.jpg'); ">

Network Interface packet traffic


The numbers listed above represent the received bytes, the compressed bytes received, the number of received codes, the missed error codes received, the FIFO error codes received, and the received frames; received multicast codes, total packets received, transmitted bytes, compressed bytes transmitted, total transmission codes, transmission carrier codes, transmission conflict codes, and transmission missing codes; transmission FIFO error code; Total number of packets transmitted.

3. Use the uptime command
You can use the uptime command to view the system load. The average system load is defined as the average number of processes in the queue running at a specific interval. If a process meets the following conditions, it will be in the running queue: there is no result waiting for the I/O operation, and it does not take the initiative to enter the waiting status (that is, it is not called or stopped.

# Uptime
Pm up 3 days, 4 users, load average: 6.02, 5.90, 3.94
The above command shows that the average system load in the last minute is 6.02, the average system load in the last five minutes is 5.90, and the average system load in the last 15 minutes is 3.94. There are four users in total. Generally, as long as the number of active processes per CPU is not greater than 3, the system performance is good. If the number of tasks per CPU is greater than 5, that means the performance of this machine has a serious problem. For the above example, because the system uses Dual CPU, the current number of tasks for each CPU is: 6.02/2 = 3.01. This indicates that the performance of the server is acceptable.

4. Display System Load Using xload graphics
If the KDE environment is installed, you can use xload to display the bar chart of the average system load, which is updated on a regular basis. Xload is a Linux system command. Usage:
# Xload [-options...]
Xload main options and instructions:

Options

Description

-Fn font

Font in the image

-Scale number

Display the minimum number of lines on the screen, each line
Represents the average load of 1.

-Update seconds

The time interval between each update, in seconds.

-Bg color

The color of the image background.

-Fg color

The foreground color of the image.

-Hl color

Font color in the image.

-Remote host

The name of the remote host.
# Xload? Scale 1? Update 1? Fg blue? H300 tan
The preceding command indicates that xload is used to view the system load. Update every second. The size is 300, the foreground color is white, and the background color is blue. See figure-3.



(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-3



   3. Use phpsysinfo
Because the/proc file system is very large and the system is dynamically changing, it is troublesome to use Linux commands. Here we use a tool: phpsysinfo, it is a PHP script tool software that supports PHP Web servers to detect some information on the host. It can extract information from the/proc file system and display it graphically. In addition, phpsysinfo supports over 20 languages including Chinese and many style templates.

1. system requirements:
In addition to building a network architecture based on LAMP (Linux + PHP + APACHE + MYSQL), other system requirements: Software: it requires at least 2.2 of the kernel (Kenerl. KDE 2.0, Desktop color at least 16 enhanced colors. Hardware: CPU: Pentium II 450 or above, 64 MB memory, 60 MB hard disk space. Preparations before installation: phpsysinfo is written in PHP and used in gdk, gtk, and glib.

2. Software Download:
# Wget http://jaist.dl.sourceforge.net/... psysinfo-2.3.tar.gz
# Wget http://secure.netroedge.com /~ Lm78/archive/i2c-2.8.8.tar.gz
# Wget http://secure.netroedge.com /~ Lm78/archive/lm_sensors-2.8.8.tar.gz

3. install the software: copy the downloaded software to the/var/www/html/directory and run:
# Music phpsysinfo-2.3.tar.gz/var/www/html/sysinfo
# Tar-zxvf phpsysinfo-2.3.tar.gz
# Cd sysinfo
# Cp config. php. new config. php

4. Run the software:
Start apache service
#/Usr/local/apache2/bin/apachectl start
Test http: // localhost/sysinfo (SEE)



(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Phpsysinfo working interface

Phpsysinfo detection is divided into five parts:

(1) host system resources: Host Name, IP address, kernel version, boot time, login count, and system load.
(2) hardware information: CPU model, operating frequency, cache size, logical operands, PCI interface, IDE interface, and SCSI interface.
(3) network load: Network Packet receiving, transmission, error/loss.
(4) memory resources, including physical memory and virtual memory.
(5) mounted partitions: Hard Disk Partition name, proportion used.
You can also use it to test the device and network quality of the virtual host you have rented. Phpsysinfo can also work in FreeBSD, OpenBSD, NetBSD, Darwin/OSX, Unix, and other systems.

   Iv. server hard disk monitoring
Server hard disk monitoring mainly includes hard disk track monitoring and disk space monitoring.
1. Hard Disk track Detection
Hard Disk Physical Bad Sectors are the most troublesome among all Linux hardware faults on the hard disk. It makes your Linux computer crash frequently, and makes all your data useless. The current factory hard drive (1993) basically supports automatic detection Analysis and Reporting Technology (Self Monitoring Analysis and Reporting Technology. SMART technology can monitor the Disk Head Unit, Drive System of Disk Motor, internal circuit of the disk, and media on the disk surface, when SMART detects and analyzes possible problems with the hard disk, it will promptly report an alarm to the user to avoid loss of computer data. SMART technology works only when the motherboard is supported, and SMART technology cannot predict all possible hard disk failures. SMART (SFF-8035i) is an industrial standard established by hard drive manufacturers. This standard is to store a table with properties such as execution, reliability, read error rate, and so on the hard drive. All attributes have a standard value of 1 byte (size range: 1-253), and contain another key stage value of 1 byte, if the data in the Attribute Table is close to less than or reaches the critical stage value, the hard disk is not working properly.

Smartmontools is a Linux hard disk detection tool, Home Page: http://smartmontools.sourceforge.net, the latest version: 5.33-1.
Download and install the software:
# Wget http://jaist.dl.sourceforge.net/... ols-5.33-1.i386.rpm
# Rpm? Ivh/smartmontools-5.33-1.i386.rpm
After the software is installed, the program smartctl will be generated in the/usr/local/directory. First, check whether the hard disk and motherboard support SMART technology (see figure-5 ):

Smartctl-I/dev/hda7




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-5 check whether the hard drive supports SMART

Figure-5 shows that the author's hard drive supports SMART, model: ST320414A (Seagate's cool fish III, 72000 RPM, 2 MB cache)
Full hard disk Inspection
Smartctl-A/dev/hda7




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-6 physical logic status of a hard disk


The information shown in Figure 6 varies with the hard disk manufacturer. lines 1-represent the different physical performance of the hard disk. The column represents the logical status of the hard disk.

FLAG is a FLAG, and the standard VALUE should be smaller than or equal to the key VALUE (THRESH ). WHEN_FAILED indicates the error message. The WHEN_FAILED vertical line in Figure 6 is empty, indicating that the hard disk is not faulty. If WHEN_FAILED shows a number, it indicates that the hard disk track may have a relatively large bad track. Smartctl has more than a dozen parameters. For details, refer:
Smartctl -- help

2. Disk Space Monitoring
Disk management commands frequently used by Linux administrators: df (disk filesystem ). You can use this command to obtain the statistical data of the file system, including available space and used space. Df command to obtain the disk space occupied and the remaining space.
Format: df [Option]... [FILE]...
The main options of the df command are shown in table-2:

Main options

Description

-

Displays the disk usage of all file systems, including 0 block file systems, such as/proc file systems.

-K

It is displayed in k bytes.

-I

Displays the I-node information, not the disk block.

-T

Displays the disk space usage of each specified type of file system.

-X

List the disk space usage of a file system of a specified type (opposite to the t option ).

-T

Displays the file system type.
Table-2 main options of the df command
The df command also displays the usage of I-node nodes and disk blocks by all file systems. A parameter is required.
:-I, see figure-7.




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-7 use the df command to display the I-node usage of all file systems
Figure-7 shows the number of available I-nodes in each file system, and the proportion of the hard disk. Therefore, the system administrator needs to understand these situations. Sometimes you may find that the capacity of some hard disks exceeds 100%. This is because the Linux system reserves 10% of the space for Super Users. That is to say, for a super user, the disk capacity he sees will be 110%. This arrangement is advantageous for system management. When the disk is used with a capacity close to 100%, the system administrator can still work normally. Df tools are widely used to generate statistical data on the use of file systems. It displays information about all the file systems in the system, including their total capacity, available free space, and current installation points.

Iv. Monitoring of the working status of the server board:
Whether the server motherboard and CPU operating temperature are normal is the core of server stability. No

The CPU cooling system never fails. The loss of the "core" of the heat dissipation system often stops "beating" forever in a few seconds ". Fortunately, smart engineers have already developed effective processor temperature monitoring and protection technologies. Monitors CPU temperature changes at any time with a special and keen sense of smell, and provides necessary protective measures to protect the CPU from high-temperature disaster recovery. Lm_sensors can effectively monitor core data such as the operating voltage, fan speed, and temperature of the motherboard and CPU. Software installation:

# Music lm_sensors-2.8.8.tar.gz/usr/local/src/
# Cd/usr/local/src/
# Tar zxvf lm_sensors-2.8.8.tar.gz
# Cd/usr/local/src/lm_sensors-2.8.8
# Tar xzf i2c-2.8.8.tar.gz
# Make clean; make dep; make all; make install
#/Sbin/depmod-
Modify the configuration file: "/etc/ld. so. conf" and add a line:/usr/local/lib
# Ldconfig
# Sensors-detect # scan all chips on the motherboard and select the default option (by car )#
Load the module. Note that the motherboard is not necessarily the same.
# Modprobe i2c-isa
# Modprobe lm78
# Modprobe sis5595
Start detection, as shown in figure-8:
# Sensors




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure 8 lm_sensors working interface


We can see that the motherboard temperature, CPU temperature, voltage, fan speed, and other information are very clear.
Advanced Application: periodically checks the running status of the motherboard:
Here you can use the Linux Command combination:
# Watch -- interval = 450 "sensors"
In this way, the sensors command can be run every 450 seconds to know the running status of the motherboard.

V. P2P communication monitoring
Peer-to-Peer (P2P) is a new technology used for file exchange. It allows you to establish distributed, dynamic, and anonymous logical networks over the Internet. P2P is a peer-to-peer connection or peer-to-peer network technology. It can be used in file sharing and exchange, deep search, distributed computing, and other fields. It allows individual PCs to share files over the Internet. With the popularization of P2P file exchange applications, ISP is facing new challenges and opportunities in maintaining and increasing the benefits of the broadband network. According to statistics, over 70% of the existing network bandwidth is occupied by P2P communication. P2P communication can cause abnormal traffic peaks and unexpected deformation of network resources. problems such as network congestion and performance degradation have affected normal network applications, for example, WWW and Email. Slow web browsing and sending and receiving speeds lead to dissatisfaction among common users.

Identify P2P communication
To control P2P communication, it is necessary to effectively identify P2P communication. However, many P2P communication uses different communication technologies and protocols and it is very difficult to identify them using traditional technologies. For example, many P2P protocols use dynamic ports instead of fixed ports, including ports of some well-known services. KaZaA can use port 80 (usually http/web) for communication, thus penetrating the traditional firewall and Packet Filter Based on IP and port. Therefore, it is difficult to identify, track, or control such communications by simply classifying IP addresses and ports (analyzing IP headers, IP addresses, and port numbers. In the past, some people used monitoring 6881 ~ Port 6889 is used to identify BT (BitTorrent), but this method has long expired-BT no longer uses a fixed 6881 ~ Port 6889 is used for communication, but the port is used dynamically. With the growth of P2P applications, more communication protocols are used. The technology for identifying and classifying P2P must be fast and simple to adapt to the changes of this technology. Now, the method to identify P2P communication is to analyze data packets at the application layer, check whether there is an application protocol signature, and then determine the communication type. The basic method for analyzing data packets at the application layer is that if the header of the data packet at the application layer contains a feature string "220 ftp server ready", it can be determined that the ftp program is used; if there is a feature string "HTTP/1.1 200 OK", it can be determined that the data is transmitted using http. Speaking of network traffic monitoring, I believe everyone is familiar with MRTG. However, MRTG has many disadvantages:

1. text-based databases are used, and data cannot be reused ;.
2. data can only be viewed by day, week, month, or year;
3. Only two DS (one line and one block) can be drawn );
4. No management function;
5. No log system;
6. I cannot know the specific composition of one-to-one traffic;

Here we will introduce a tool: ntop can display the network usage and network bandwidth usage details of each node computer more intuitively. Ntop is a network sniffer which plays an irreplaceable role in monitoring network data transmission and troubleshooting. You can analyze network traffic to identify various problems on the network, such as the bottleneck effect or performance degradation. You can also determine whether a hacker is attacking the network system. If it is suspected that the network is under attack, the packets intercepted by the sniffer can determine what type of packets are being attacked and their sources, so as to respond in a timely manner, or adjust the network to ensure the efficiency and security of network operation. The ntop network manager can also easily determine which traffic belongs to a specific network protocol, which host accounts for the main traffic, which host is the target of each communication, the packet sending time, and the interval between data packets transmitted between hosts. This information provides valuable information for network administrators to determine network problems and optimize network performance.

Ntop provides the following features:
1. automatically identifies useful information from the network;
2. Convert intercepted data packets into a format that is easy to recognize;
3. analyze communication failures in the network environment;
4. Detect communication bottlenecks in the network environment;
5. Record the network communication time and process.

Compared with MRTG, ntop has simpler installation and configuration, and does not use the Apache server. It can also be used with MRTG. Currently, network-managed vswitches and vrouters on the market support the SNMP protocol. Ntop supports the Simple Network Management Protocol, so it can monitor network traffic. Ntop can monitor almost all protocols on the Network: TCP/UDP/ICMP, (R) ARP, IPX, Telnet, DLC, Decnet, DHCP-BOOTP, AppleTalk, Netbios, TCP/UDP, FTP, HTTP, DNS, Telnet, SMTP/POP/IMAP, SNMP, NNTP, NFS, X11, SSH, and P2P-based protocol eDonkey, overnet, Bittorrent, Gnutella (Bearshare, Limewire, etc), (Kazaa, Imesh, Grobster ). In the http://www.ntop.org you can download the latest source code installation for use.

Software Download:
Official Website: http://www.ntop.org/ntop.html, the latest version of the source code (August Nian) and related function libraries:
Wget http://www.mirrors.wiretapped.ne... g/ntop/ntop-3.2.tgz
Wgrt ftp://ftp.rediris.es/sites/ftp.r... p-0.6.2-12.i386.rpm
Software Installation: (pay attention to the installation sequence)
# Rpm? Ivh libpcap-0.6.2-12 RPM for i386
# Tar zxvf ntop-3.2.tgz
# Cd ntop/gdchart0.94c
#./Configure
......
Do not forget to build: # The system prompts you to compile the gd and zlib modules first #
1. gd-1.8.3/libpng-1.2.1
2. zlib-1.1.4/
# Cd gd-1.8.3/libpng-1.2.1/
# Cp scripts/makefile. linux Makefile
# Make
# Cd.../zlib-1.1.4
#./Configure
# Make
Cd ..
# Make
Compile the program in the Ntop directory:
# Tar zxvf ntop-2.2.tgz
# Cd ../ntop/
#./Configure
# Make; make install
Create a log directory
# Mkdir/var/log/ntop/
Start ntop
# Ntop-P/var/log/ntop/-u nobody &

When you run the system for the first time, it requires you to enter the administrator password. The default password is admin. You do not need to enter the password for the second execution. you can enter http: // IP: 3000 in the browser to open the management interface. To view the overall network traffic, click "Stats" and then download the "Triffic" option. Network Traffic is displayed in a cylindrical diagram and a detailed table. To view the computer Traffic of a user, click "IP Traffic"-"Host. If you want to know the data transmitted by the computer, double-click "CAO" to analyze the protocol types and bandwidth usage of various network transmission protocols. See figure-9. The port usage is shown in figure-10.




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-9 view the network protocol type and occupied bandwidth




(400) {this. resized = true; this. width = 400; this. alt = 'click here to open new window';} "onmouseover =" if (this. resized) this. style. cursor = 'hand'; "onclick =" window. open ('HTTP: // response); ">

Figure-10 port usage list
Data that Ntop can monitor includes network traffic, usage protocol, system load, port status, packet sending time, and data TTL. Through it? Basically, all inbound and outbound data is invisible, no matter whether it is used for routine network monitoring? Or report? They are all excellent tools to make your network traffic transparent. It works like a passive sonar. It silently receives various information from the network. By analyzing the data, the network administrator can gain a deeper understanding of the current running status of the network. But? Since ntop is essentially a sniffer, is it a double-edged sword? How can we protect the information that can only be obtained by authorized persons? It will become extra important.

Summary:
Linux Server monitoring is a very important task. server running should provide the most effective system performance. Total data traffic of network servers (total data transfer of network cards), and packet transfer rate (or traffic) of CPU usage and special services ), network administrators must pay attention to this because when the CPU usage of the host is too high, the system may be unstable, and when the traffic changes abnormally, note that some hackers may be trying to steal our information. In terms of network management, it is necessary to know the network traffic status of our Linux server and limit or increase the bandwidth based on the traffic. This article describes how to monitor system performance from Linux commands to some simple but useful tools. Relying on the data obtained by these tools, you can establish a personal experience on the system performance. After establishing a reliable bottom line for system performance, you can use the flexibility of the Linux operating system to set it. Make it more efficient. The NTOP and phpsysinfo mentioned in this article are all open source software, but I think it is really inferior to other commercial management software.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.