Just arrived at the company this morning, found that the research and development environment of the machine is not connected.
The deployment of the company's development environment is simple, the physical machine is loaded with VMware Esxi 6, and the VM is mounted on ESXi.
Check found: ESXi ping does not connect, the client is not connected to the physical machine remote management card ping, IPMI management client is not connected.
Processing method: Five years ago, the machine, remote management card is not connected, is generally the server hardware problems. Do not care about it, directly find another machine to build a set of research and development environment is. New research and development environment the number of machines used unchanged, just to change the IP address of four machines. See:
Say dry, put up, the machine installed after the deployment of services, in the deployment of debugging in the process of discovering some of the machine special card, ssh up after the command card, generally have to wait more than 10 seconds to ease over.
Investigation process:
1. Detection of ESXi physical machine performance, not seen abnormal
2, detect the performance of each virtual machine, no exception
3, because the new research and development environment is completed by two people, detection of two people history operation record and configuration file, not seen abnormal
4, Baidu ESXi virtual machine lost packets, no fruit
5, check with the original virtual machine (physical machine on the deployment of new research and development environment before there are 8 virtual machines), the original virtual machine did not find the phenomenon of packet loss
6, write a script to ping the new research and development environment of the IP, found in the new use of IP (green part) a package is not lost
7, contrast test, new two VMS 10.12.30.61 and 10.12.30.62, ping test, do not lose packets
8, to the new two VMS changed IP for the original used 10.12.30.7 and 10.12.30.8, test, found packet loss phenomenon
9, think: IP conflict? Old machine Physics machine are hung, VMS are not even up, can not rob each other IP Ah!!!
10, validation 9 in the idea, when I loop ping script report 10.12.30.12 ping failure, open a new SSH session, quickly execute multiple arp-an, see. Also really IP conflict!!!! The same IP address, two times see the MAC address is not the same. Did the old machine recover itself?
11, again check the old machine Remote management card, physical machine operating system, virtual machine operating system, is still not connected. But the problem is definitely on the old machine.
12, verification 11 of the idea, because the remote management card is not connected, I am not in the room, that can only go to switch on the old machine interface shutdown. On the switch to the old machine interface shutdown after the ping test, everything is normal, a package is not lost.
13, it seems that 11 of the idea is right, in fact, is not haunted, the machine down, although a lot of services can not be used, because there is no power-down operation, some of the basic services are still running in memory, such as the outage after the physical machine and the virtual machine are ping not connected, but also can do ARP reply , it's more tenacious.
Lessons learned: If the physical machine is identified as a hardware failure can not continue to use, must be a power-off processing, but also for the computer room other server security and stability
Downtime is not equal to shutdown, haunted VMS