The customer reported that the virtual machine could not be created and the time was not synchronized. How can the cinder service be up and down for a while ?, Novacinder
The customer reported that they could not create a virtual machine (the openstack version is Juno). They logged on to the control node and found that the nova and cinder services were down. They checked the nova and cinder logs of the down nodes, no error is found in any log information, and the logs show that both nova and cinder are in the normal update status. For VM creation requests, nova-schedule does not perform any scheduling, and the status of the created virtual machine changes directly to error.
Check the nova and cinder services several times and find that the service status of many nodes keeps beating between the down and up.
1. All the nova services on node-1 are down, and the nova services on other nodes are basically normal.
Run the command again about 2 10 seconds later and find that all the nova services on node-1 are up, but all the other node nova services are down.
3 cinder services on node-1 are all up, and other nodes are down.
In about 4 10 seconds, all cinder services of node-1 are down, and cinder services of other nodes are normal.
5. Check whether the brain split is caused by rabbitmq. Check that rabbimq is normal and there is no message blocking.
6. Check the time service and find that the time of each node is not synchronized. The time difference is relatively large (ntp after deployment is complete. in the conf file, each node is configured to synchronize time to the deployment node. If the deployment node is disabled, the configuration will be modified. After a period of time, the time difference between nodes will be very large ).
7. Modify the ntp configuration and synchronize time to node-1. Then, it is found that the services are normal and the virtual machines are created normally.
Conclusion: This accident is a pitfall that has not been optimized after implementation. Strict time synchronization is required for O & M and production environments. Otherwise, it is very likely that the system will crash one day.