Today colleagues encountered a problem when configuring OpenStack's compute nodes (deployed on CentOS 7) and found that running Nova service-list on the control node found only 4 rows, no Nova-compute service, let me help him troubleshoot.
Early disclosure reason: This colleague in/etc/nova/nova.conf configuration file verbose = True written verbose =ture, I also checked for half a day (two eyes on 3 times) did not see, a little sweat!
Based on my previous experience, the problems encountered in the deployment process of OpenStack can be summarized as profile problems, configuration steps missing, and so on, because usually the computer does not make mistakes, only the people who make mistakes ...
Faults usually have the following conditions:
Time synchronization problem, time between two (more) nodes is out of sync
Database problems, permissions issues, database missing, table structure does not exist (database building table structure error), User name password error, etc.
The package is not installed correctly, such as the fluctuation of the source network abroad or the site is faulty or the site temporarily modified the source path, etc. will cause the package installation of a major error, and the installation of the person did not find
Configuration file configuration error, this error is the most common, often a difficult to draw attention to the problem, as the second paragraph of the beginning of this article said, and so on and so on 0 write O, the 1 written l,service as the server, etc.
Network interface address error, such as this troubleshooting step also found on the control node endpoint-list found the public URL with the wrong address, such as the local loopback address instead of the management interface address
Lack of service users, generally caused by software bugs or software installation is not correct, such as the previous RABBITMQ need RABBITMQ does not exist, resulting in RABBITMQ guest password can not be modified and so on.
File permissions issues, such as the configuration file after the replacement of the configuration file permissions, such as the original Root:nova file owner, was replaced by Root:root, there will be a problem that the service does not work properly.
The usual troubleshooting methods are summarized as follows:
Check the software log and system log, this is the first step to do, if you do not generate software logs to consider viewing the system log, you can empty the/var/log/messages file and then execute the relevant commands, view the log in this file, You can use this method to resolve the issue (which is critical in today's troubleshooting) without generating a log after executing the command.
Remove packages with rpm–e packages instead of Yum erase packages, to avoid deleting dependent packages that need to be used
Keep (back up) profile, reinstall package, yum reinstall packages
After executing a command that does not determine the result of execution, use the echo $? To check the execution result, 0 is not wrong, more than 1 is a serious error, such as performing su-s/bin/sh-c "Nova-manage db Sync" Nova, If you encounter permissions problems in the database problem mentioned earlier, you will get an error, but the command ends without error.
Carefully compared to the configuration file, the comment lines and blank lines are all clear and then make a comparison, grep–v \#/filepath/filename | Grep–v ^$ can be implemented to delete comment lines and blank lines
Firm belief, the computer will not make mistakes, others can succeed, it must be their own fault!
How this failure is excluded:
First discovered that running Nova service-list on the control node found only 4 rows, no Nova-compute service, and could be associated with compute services not running on compute nodes
After performing the Systemctl status Openstack-nova-compute.service-l view the detailed results, the discovery service is not running, and the display openstack-nova-compute.service start Request repeated too quickly, refusing to start. Unit Systemd-journald.socket entered failed state.
The file does not exist when you view Nova-compute's log file/var/log/nova/nova-compute.log
Therefore, it should be immediately thought that the configuration file (/etc/nova/nova.conf) must be problematic, but because the comparison has been 2-3 times, there is still no definite problem
Since then I have checked the system time, database, file permissions, back up the configuration file after the installation of the package and so on whether there is a problem, the results are all normal, and finally think of the 7th step
The system log can be used to see why the service failed to start,/var/log/messages file is emptied (True >/var/log/messages), and then restarted Openstack-nova-compute.service
Observe the contents of the/var/log/messages file again, and discover that the Boolean value ture is illegal in the Nova configuration file,
So find the location of ture in the configuration file (/etc/nova/nova.conf), change this word to true, restart
Perform a checksum command on the control node Nova service-list, OK no other problem, successfully!
End
This article is from "Communication, My Favorites" blog, please make sure to keep this source http://dgd2010.blog.51cto.com/1539422/1587808
OpenStack Service Start Troubleshooting Experience