VPS Virtual Machine df-h root partition 100%

Last Update:2018-09-07 Source: Internet

Author: User

Tags vps iptables server port

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This morning received a Netizen's help, saying that the root partition of the server is full. However, there is no specific large file to find. Because the fault is really weird, I will come to the relevant account password of the failed server.

Failed server related environments:

System: Centos 6.5

Selinux:disabled

Iptables: Open, but the default policy is all: ACCEPT

Our auxiliary server related environment:

System: Centos 6.5

Selinux:permissive

Iptables: Open, but the default policy is all: ACCEPT

First of all, we are on the faulty server to simply touch the situation of the problem:

[Root@abc ~]# Df-h
Filesystem Size used Avail use% mounted on
/dev/simfs 20G 970M 0 100%/
None 512M 4.0K 512M 1%/dev
None 512M 0 512M 0%/dev/shm

[Root@abc ~]# Df-ih
Filesystem inodes iused IFree iuse% mounted on
/dev/simfs 10M 27K 10M 1%/
None 128K 156 128K 1%/dev
None 128K 1 128K 1%/DEV/SHM

[Root@abc ~]# CD/
[Root@abc/]# Du-sh *
6.7M bin
12K Boot
4.0K Dev
7.6M etc
4.0K Home
11M Lib
39M lib64
4.0K Lost+found
4.0K Media
4.0K MNT
16K nonexistent
4.0K opt
Du:cannot access ' proc/626/task/2558 ': No such file or directory
Du:cannot access ' PROC/2551/TASK/2551/FD/4 ': No such file or directory
Du:cannot access ' PROC/2551/TASK/2551/FDINFO/4 ': No such file or directory
Du:cannot access ' PROC/2551/FD/4 ': No such file or directory
Du:cannot access ' PROC/2551/FDINFO/4 ': No such file or directory
0 proc
478M Root
13M Sbin
4.0K SELinux
4.0K SRV
0 SYS
152K tmp
960M usr
197M var

With the du command, we can see how much disk space is not used. Add up to no more than 2G.

Next we look at the memory:

[Root@abc ~]# Free-m
Total used free shared buffers Cached
mem:1024 76 947 0 0 28
-/+ buffers/cache:47 976
swap:0 0 0

The memory is also sufficient, and the swap partition is not turned on.

Then look at the disk mount situation:

[Root@abc ~]# Mount
/dev/simfs on/type Simfs (rw,relatime)
Proc On/proc type proc (rw,relatime)
Sysfs On/sys type SYSFS (rw,relatime)
None On/dev type DEVTMPFS (rw,relatime,mode=755)
None on/dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
None On/dev/shm type TMPFS (rw,relatime)
None On/proc/sys/fs/binfmt_misc type Binfmt_misc (rw,relatime)
[Root@abc ~]# Fdisk/dev/simfs

Unable to Open/dev/simfs

File system is: Simfs, this file system has not been seen, but the estimate should be used by VPS vendors, but the implementation of FDISK error, temporarily noted

The Next top command checks to see if there is an exception to the process:

The top command did not find the exception process, and the server load was not high.

Next, look at the ports opened by the server:

[Root@abc tmp]# SS-TNLP
State recv-q send-q Local address:port Peer address:port
LISTEN 0 0 *:22 *:* Users: (("sshd", 487, 3))
LISTEN 0 0::: 1999:::* Users: (("Python", 740,4))
LISTEN 0 0::: +:::* Users: (("Python", 740,9))
LISTEN 0 0::: 2001:::* Users: (("Python", 740,11))
LISTEN 0 0::: 2002:::* Users: (("Python", 740,7))
LISTEN 0 0::: 2003:::* Users: (("Python", 740,6))
LISTEN 0 0::: 2004:::* Users: (("Python", 740,17))
LISTEN 0 0::: 2005:::* Users: (("Python", 740,19))
LISTEN 0 0::: 2006:::* Users: (("Python", 740,13))
LISTEN 0 0::: +:::* Users: (("sshd", 487,4)
LISTEN 0 0:::-:::* Users: (("Python", 740,15))
LISTEN 0 0::: +:::* Users: (("Python", 740,21))
LISTEN 0 0::: The:::* Users: (("Python", 740,23))
LISTEN 0 0::::-:* Users: (("Python", 740,25))

You can see that the server has opened a lot of ports, there are 22 sshd, and others are open by the Python program.

Because this is a cloud server, it is possible that the server port is released, but the cloud vendor's Web firewall to the port to block off, so we can use the secondary server to scan this server:

Starting Nmap 5.51 (http://nmap.org) at 2018-09-07 10:44 CST
Nmap Scan Report for abc.com (1.1.1.1)
Host is up (0.20s latency).
Not shown:982 closed ports
PORT State SERVICE
22/TCP Open SSH
135/tcp Filtered Msrpc
139/tcp Filtered NETBIOS-SSN
445/tcp Filtered Microsoft-ds
593/tcp Filtered Http-rpc-epmap
1999/TCP Open Tcp-id-port
2000/TCP Open CISCO-SCCP
2001/TCP Open DC
2002/TCP Open globe
2003/tcp Open Finger
2004/TCP Open Mailbox
2005/TCP Open Deslogin
2006/TCP Open Invokator
2007/TCP Open DECtalk
2008/TCP Open conf
2009/TCP Open News
2010/TCP Open Search
4444/tcp Filtered krb524

Through Nmap scan this server, found that really open so multi-port, the host of the server is really very bold ah, we must not learn this brother.

We'll look at the application of the run by PS:

The process of running is very small and there is no process that looks very unusual. Well, coding is because this process is: Yes yes. Everyone ignores ha.

With the W command, see 2 users logged in to this server, one is the server owner, the other is my

At this point, the simple (sum, is not complicated?) Server health is over. Depending on what you see, I suspect that a large file was deleted manually, but this file is also occupied by processes such as Python or haproxy. Causes the disk not to be released.

Let's take a look through Lsof:

Found that there is no big file, sshd this deletion looks rather strange. But it should not be the case, because asked the server owner, after this problem, he has restarted the server, but this problem still exists.

In that case, I wonder if the server was compromised and the commands were replaced. So, what we see is not necessarily the truth.

Look at the size of the DF command and see if the command size is the same as my secondary server:

Failed server:

[Root@abc ~]# which DF
/bin/df

[Root@abc ~]# LL/BIN/DF
-rwxr-xr-x 1 root root 90544 June 2014/BIN/DF

Secondary server:

[Root@cte2-nginx-tomcat ~]# LL/BIN/DF
-rwxr-xr-x. 1 root root 95880 Nov 2013/bin/df

The system version of the server is consistent, and the size of the command should, in principle, be consistent. However, the command for the failed server is smaller than the secondary server.

Let's look at the PS command again.

Failed server:

[Root@abc tmp]# which PS
/bin/ps
[Root@abc tmp]# Ll/bin/ps
-rwxr-xr-x 1 root root 82024 Nov 2012/bin/ps

Secondary server:

[Root@cte2-nginx-tomcat tmp]# which PS
/bin/ps
[Root@cte2-nginx-tomcat tmp]# Ll/bin/ps
-rwxr-xr-x. 1 root root 89480 Jul 2017/bin/ps

This is also the case here. The command for the failed server is smaller than the size of the secondary server

This is even more proof of our speculation that the command may have been passively manipulated, and in that case, we will upload the normal command of the secondary server to the/tmp directory of the failed server, and then look at the results:

[Root@cte2-nginx-tomcat tmp]# scp/bin/ps 1.1.1.1:/tmp/
Address 1.1.1.1 maps to abc.com, it is does not map back to the address-possible break-in attempt!
root@1.1.1.1 ' s Password:
PS 100% 87KB 87.4kb/s 00:00

Okay, the PS command for the secondary server has been uploaded to the failed server, so let's see if there are any related changes:

Failed server:

PS of the secondary server executes the results on the failed server:

Well, it's embarrassing. There seems to be no problem, and the results of the execution of the two commands are consistent. This server does not have any abnormal processes and the load is not raised. Can basically rule out the possibility of intrusion, in general, the problem of the command is much larger than the normal command, and here is a little bit smaller

Unfortunately, at that time forgot to check the version of the command, it is estimated that the version is caused by a small difference.

Now troubleshooting, caught in the confusion, this problem is really rare. It must be where we missed the relevant information. Suddenly I think of that strange file system: Simfs

Since there is no other problem, is it possible that the file system Simfs caused it?

I then queried the relevant information: The problem was found to be such--you have no own file system. The/dev/simfs is just a fake device name, OpenVZ uses to create it ' s fake file system. Your Real files (as well as the files of all and containers) reside on the host node ' S/VZ filesystem. That filesystem (/vz on the Hostnode) was full. There could be many reasons for this, but in the very cases this is caused by heavy overselling of the disk space.

To put it bluntly, because VPS vendors, over-selling VPS virtual machine, the host has no disk space above. The disk that caused the VPS virtual machine is not available either. At this point the problem is finally found, the next step is to let users contact VPS suppliers to replace the VPS.

Summary: Can not let go of any clues, sometimes the fault is not necessarily caused by itself, to consider the specific relevant environment.

Reference: http://www.webhostingtalk.com/showthread.php?t=1083184

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More