Xen virtual machine hangs, host the problem of suspended death, the whole idea

Last Update:2016-11-14 Source: Internet

Author: User

Tags syslog

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The problem of the host working environment with the xenserver6.5 cluster, one day up suddenly found a VM not connected, thinking that it went up XenServer restart the virtual machine, the result of forced restart can not be successful, go to host query disk space

[Email protected] cron.d]# df-hfilesystem Size used Avail use% mounted on/dev/sda1 20G 20G 0 100%/none 7.8G 2.0M 7.8G 1%/DEV/SHM

found that the host disk space is full, OK, clear disk space bar, the results of the following command to find

[[Email protected] /]# cd /[[email protected] /]# du -sh *5.7m     bin24M     boot2.1M    cli-rt3.3M     dev7.4M    etc28K     EULA4.0K     home118M    lib20M     lib6416K      lost+found4.0K    media4.0K    mnt554M     optdu: cannot read directory  ' proc/7020 ': no such file or  directorydu: cannot read directory  ' proc/7021 ': no such file or  Directory0       proc12k     read_me_ first.html102m    root24m     sbin4.0k     selinux4.0k    srv0       sys1.6m    tftpboot68k      Tmp542m    usr2.6g    var

OK, disk space is not full, then what to do, where the other space, think of it should be deleted from the free space files caused, and then execute the following command to see which files are deleted is still in use

[[Email protected] cron.d]#  ls -l /proc/[0-9]*/fd/* |grep delete  ls: /proc/29018/fd/255: no such file or directoryls: /proc/29018/fd/3 :  no such file or directoryl-wx------ 1 root   root    64 Nov 14 13:14 /proc/22020/fd/2 -> /tmp/stunnelbd3855.log  ( deleted) L-WX------ 1 root   root   64 nov 14 13:27  /proc/24758/fd/2 -> /tmp/stunnel1bc930.log  (deleted) lrwx------ 1 root    root   64 nov 14 11:03 /proc/4555/fd/6 -> /tmp/ tmpflfgwgg  (deleted) lrwx------ 1 root   root   64 Nov  14 11:03 /proc/4556/fd/6 -> /tmp/tmpflfgwgg  (deleted) l-wx------ 1 root    root   64 nov 14 11:03 /proc/4587/fd/5 -> /var/run/openvswitch/ ovs-xapi-sync.pid.tmp4587  (deleted) l-wx------ 1 root   root   64  nov 14 11:03 /proc/4587/fd/12 ->  /var/log/blktap/tapdisk.2345.log   (Deleted)

Try a lap, the last most likely is/var/log/blktap/tapdisk.2345.log (deleted) This file

Tapdisk.2345.log This file description file is a log file with a Tapdisk process ID of 2345, the main record tapdisk monitoring disk image logging, such as the following log records

Aug 21 17:55:06: [17:55:06.597] tapdisk_vbd_check_progress: vhd:/dev/vg_ xenstorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/vhd-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog  timeout: pending requests idle for 60 secondsaug 21 17:55:06: [ 17:55:06.597] tapdisk_vbd_check_progress: vhd:/dev/vg_ xenstorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/vhd-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog  timeout: pending requests idle for 60 secondsaug 21 17:55:06: [ 17:55:06.921] tapdisk_vbd_check_progress: vhd:/dev/vg_ xenstorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/vhd-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog  timeout: pending requests idle for 60 secondsaug 21 17:55:06: [ 17:55:06.925] tapdisk_vbd_check_progress: vhd:/dev/vg_ xenstorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/vhd-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog timeout: pending requests idle for  60 seconds

Then the Xen virtual machine hangs, will cause the first problem, unable to restart the virtual machine, the host disk space is full, log files are deleted?

The answer is that after the virtual machine hangs, the tapdisk process of the VM on the host keeps brushing the log until the disk is maxed out, causing the virtual machine to restart, because the host's disk space is full. However, if the log size exceeds the size of the log scrolling triggered, the log has a backup operation, and after scrolling just better than the preset maximum number of reserved limit, the file will be deleted

[[email protected] /]# rpm -vv  elasticsyslog........  c /etc/ cron.d/logrotate.cron........  c /etc/logrotate-xenserver.conf........    / etc/sysconfig/syslog.elastic........    /etc/sysconfig/syslog.patch........     /opt/xensource/bin/delete_old_logs_by_space........    /opt/xensource/bin/ elasticsyslog........    /opt/xensource/bin/logrotate-xenserver........     /opt/xensource/bin/rotate_logs_by_size[[email protected] /]# cat /etc/logrotate.conf # see  "Man logrotate"  for details# rotate log files weeklyweekly#  keep 4 weeks worth of backlogsrotate 4# create new  (empty)  log files after rotating old onescreate# uncomment this if  You want your log files compressed#compress# rpm packages drop log rotation  Information into this directoryinclude /etc/logrotate.d# no packages own  wtmp -- we ' ll rotate them here/var/log/wtmp {     Monthly    minsize 1m    create 0664 root utmp     rotate 1}/var/log/btmp {    missingok     monthly    minsize 1M    create 0600 root  Utmp    rotate 1}# system-specific logs may be also be  configured here.

Said so much, the solution is also very simple, is to release the process of taking the deleted files, see the above/var/log/blktap/tapdisk.2345.log (deleted), the process number is 2345, kill it

[[email protected] /]# ps -ef |grep  2345root     18165 15432  0 14:22 pts/37    00:00:00 grep 21611root     2345     1   0 jun01 ?        03:10:55 tapdisk[[email protected ] /]# kill 2345[[email protected] /]# df -hfilesystem             size  used avail use% mounted  on/dev/sda1              20G   4.1G   15G  22% /none                   7.8G  2.0M  7.8G    1% /dev/shm

Well, see the space out, this time, you will see the host is back to normal, because there is disk space, we originally hung off the virtual machine has been shut down.

Then, start the virtual machine, if you are a cluster of virtual machines, it is the simplest, on the other host on the boot can be, if you are a single virtual machine, or want to boot on the original host, then you need to start tapdisk, here need a number, before you kill the virtual machine process, it is best to remember that There is no good way to execute the following command, save, wait until the kill process is executed, then execute the following command, you can find the boot Tapdisk worker process that should be virtual machine

#查看所有的tapdisk进程 #ps-ef |grep Tap # to start the VM's own tapdisk process, note that the 8 here is my comparison with the execution Ps-ef |grep tap before and after kill, not fixed #tapback-D-X 18

After you start the Tapdisk process for the VM, you can start the virtual machine normally.

The following is the supply, explain what is tapdisk, can give a friend in need, my English is also able to read only the level of understanding, it is not caught dead translation:

Url:https://wiki.xen.org/wiki/blktap

Tapdisk, each tapdisk process in userspace are backed by one or several image files

When Xend is started the userspace daemon Blktapctrl is started, too. When booting the Guest VM the xenbus is initialized as described In xensplitdrivers. The request for a new virtual disk was propagated to Blktapctrl, which creates a new character device and both named Pipes F Or communication with a newly forked tapdisk process.

After opening the character device, the shared memory is Mapped to the fe_ring using the-mmap system call. The Tapdisk process opens the image file and sends information about the Imageas size back to Blktapctrl, which stores it. After this initialization tapdisk executes a select system call on the named pipes. On an event it checks if the TAP-FD are set and if it is, tries to read a request from the frontend ring.

The Xenbus connection between DomU and Dom0 are used by Xenstore to negotiate the Backend/frontend connection. After the setup of both backend and frontend a GKFX ring page and an event channel is negotiated. These is used for any further communication between backend and Frontend. I/O requests issued in the guest VM is handled in the guest OS and forwarded using these both communication channels.

There is a trade-off between delay and throughput which are controlled by modifying the number of requests until the BLKTAP Driver is notified.

The Blktap driver notifies the appropriate Blktapctrl or tapdisk process depending on the event type by returning the poll and waking up the tapdisk process respectively. The shared frontend ring works as described in the ring.h.

Tapdisk reads the request from the frontend ring and in case of synchronous I/O reads and immediately returns the request. In case of asynchronous I/O A batch of requests is submitted to Linux AIO subsystem. Both mechanisms read from the image file. In the asynchronous case it's checked using the non-blocking system call io_getevents if the I/O requests were.

The information about completed requests are propagated in the frontend ring. The Blktap driver is notified by the Tapdisk process and the IOCTL system call.
Using the same xensplitdevices mechanism the data is returned to the frontend of the Guest VM.

650) this.width=650; "src=" Https://wiki.xen.org/images/0/06/Blktap%24blktap_diagram_differentSymbols.png "alt=" Blktap$blktap diagram Differentsymbols.png "/>

This article is from the "Nano Dragon" blog, please be sure to keep this source http://arlen.blog.51cto.com/7175583/1872634

Xen virtual machine hangs, host the problem of suspended death, the whole idea

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More