Ceph deadlock failure under high IO
On a high-performance PC server, ceph is used for VM image storage. In the case of stress testing, all virtual machines on the server cannot be accessed.
Cause:
1. A website service is installed on the virtual machine, and redis is used as the cache server in the website service. When the pressure is high (8000 thousand accesses per second), all the VMS on the host machine cannot be accessed.
2. In the event of a fault, some virtual machines cannot be pinged, and some virtual machines can be pinged, but cannot be logged on through ssh.
At first, we thought it was a bridge fault. The NIC fault of KVM's virtio is very famous. When a bridge is used, memory overflow may occur. This causes the bridge to fail. The solution provided by Xen is to disable the tso support of the bridge.
(Run the command ethtool -- offload <network device> tso off)
However, after the network service is restarted, the fault does not disappear.
Therefore, the bridge fault is eliminated.
After repeated failures, the ssh of a virtual machine is not disconnected, so the cd command can be executed, but the ls command cannot be executed, and the input/output error is reported, this error is a file system fault.
So I began to suspect that there was a problem with the file system.
This file system is ceph. Check the ceph log and find that ceph reports a large number of fault logs when a fault occurs:
16:36:28. 493424 osd.0 172.23123123: 6800/96711 9195: cluster [WRN] 6 slow requests, 6 supported ded below;
Oldest blocked for> 30.934796 secs
And
18:46:45. 192215 osd.2 172.132131231: 6800/68644 5936: cluster [WRN] slow request 240.415451 seconds old
, Sorted ed at 18:42:44. 776646: osd_op (13213213500 [
Stat, set-alloc-hint object_size 4194304 write_size 4194304, write 2269184 ~ 524288] 0.5652b278 ack + ondisk + write + kno
Wn_if_redirected e48545) currently waiting for rw locks
There is a deadlock.
Check disk IO records and find that the redis server has a large number of disk write operations when a fault occurs. It is found that rbd persistence is frequently triggered at a high operating frequency, as a result, a large number of disk io occurs. These disk IO results in insufficient write time for other disk operations, resulting in a ceph deadlock on osd.
The solution is to disable the rbd persistence of redis.
A long-term solution is to prevent redis from writing data to the ceph partition persistently. In addition, do not write or read high IO images from the ceph Virtual Machine (unreliable ...)
Experience summary:
1. Ceph has the risk of deadlocks under high IO. Ceph does not provide an unlock mechanism. The official solution is not to place Virtual Machine images on ceph... Speechless ..
2. The storage network and business network should be isolated and separated during system design. A system service can be divided into the Internet, business network, storage network, heartbeat network, management network, and five network forms.
-------------------------------------- Split line --------------------------------------
Ceph environment configuration document PDF
Deploying Ceph on CentOS 6.3
Ceph Installation Process
HOWTO Install Ceph On FC12 and FC Install Ceph Distributed File System
Ceph File System Installation
CentOS 6.2 64-bit installation of Ceph 0.47.2
Ubuntu 12.04 Distributed File System (Ceph)
Install Ceph 0.24 on Fedora 14
-------------------------------------- Split line --------------------------------------
Ceph details: click here
Ceph: click here
This article permanently updates the link address: