1. Questions
We often find that the status of a cinder service is down. For example, the Cinder-scheduler and BLOCK1 nodes on the controller are down on the Cinder-volume state.
[Email protected]:~$ Cinder service-list
+------------------+---------------------------+------+---------+-------+----------------------------+--------- --------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+---------------------------+------+---------+-------+----------------------------+--------- --------+
| Cinder-backup | Controller | Nova | Enabled | Up | 2015-03-30t00:53:32.000000 | None |
| Cinder-scheduler | Controller | Nova | Enabled | Down| 2015-03-30t00:51:53.000000 | None |
| Cinder-volume | Block1 | Nova | Enabled | Down| 2015-03-30t00:54:43.000000 | None |
| Cinder-volume | [Email protected] | AZ1 | Enabled | Up | 2015-03-30t00:54:14.000000 | None |
| Cinder-volume | [Email protected] | AZ1 | Enabled | Up | 2015-03-30t00:54:13.000000 | None |
| Cinder-volume | [Email protected] | Nova | Enabled | Up | 2015-03-30t00:54:08.000000 | None |
+------------------+---------------------------+------+---------+-------+----------------------------+--------- --------+
Let's take a look at the implementation code of Cinder-list:
classServiceController (Wsgi. Controller):@wsgi. serializers (XML=servicesindextemplate)defindex (self, req):"""Return A list of all running services. Filter by Host & service name. """Context= req.environ['Cinder.context'] Authorize (context) detailed= Self.ext_mgr.is_loaded ('os-extended-services') Now=Timeutils.utcnow () //Get Controller's current time services=Db.service_get_all (context) //Get all Cinder Service list from db ...Svcs= [] forSvcinchServices: //Polling each service Delta= Now-(svc['Updated_at']orsvc['Created_at']) //Get updated_at. Does not exist, gets created_at, and calculates the time difference from the current alive= ABS (Utils.total_seconds (delta)) <=conf.service_down_time //Gets the absolute value of the time difference and checks if it is less than the configured Server_down_time, which defaults to 60 seconds Art= (Alive and " up")or " Down" //If the difference is less than 60, the service status is up, otherwise it is downActive='enabled'......svcs.append (ret_fields)return{'Services': Svcs}
The Up/down state of the visible service depends on the value of the updated_at column of the service table's row in the database and the time of the current controller node within the configured range .
2. Update_at value update mechanism for Cinder Service
Various service cinder, such as Cinder-api,cinder-backup, are the class service (service) in the/cinder/service.py file. Service), the Start method for this class is as follows:
defStart (self): version_string=version.version_string () Log.info (_ ('starting% (topic) s node (version% (version_string) s)'), {'Topic': Self.topic,'version_string': version_string}) ...ifself.report_interval: //If the Report_interval configuration item is set, the service initiates an infinite loop to execute the Report_state method, and the run interval is Report_ Interval, whose default value is 10 seconds Pulse=Loopingcall. Fixedintervalloopingcall (self. Report_state) Pulse.start (interval=Self.report_interval, Initial_delay=self.report_interval) Self.timers.append (Pulse)
The Report_state method updates the properties of serive in the DB, where the value of Updated_at is the time at which the method is executed once on the node.
defreport_state (self):"""Update the state of this service in the datastore."""Ctxt=context.get_admin_context () zone=Conf.storage_availability_zone State_catalog= {} Try: ...Service_ref=Db.service_get (Ctxt, self.service_id) //Get service ref ...db.service_update (Ctxt, self.service_id, State_catalog) //Update the service ...
3. Problem locating steps
(1) To see if the value of Report_interval configuration item is in Cinder.conf, if the Service_down_time configuration item is exceeded by the default of 60 seconds, then the status of the service must be ' down '.
(2) Look at the time of the service node, its time and controller node time error must be within [service_down_time-report_interval], that is, in the use of the default configuration, the difference must be within 50 seconds.
(3) Look at the service log file, confirm that the Report_state method is called on time, inconvenient to see, add a comment in the code. Like what:
2015-04-11 15:26:24.210 8517 DEBUG cinder.service [-] Enter report_state: report_state/usr/lib/python2.7/dist-packages/cinder/service.py:283
4. Problem solving
(1). Check the time of Block1
Find Block1 time and controller out of sync. By synchronizing the time of the BLOCK1 and controller, the state of the Cinder-volume on the block1 becomes up.
(2). Check the updated_at of the Cinder-scheduler service
The updated_at of Cinder-scheduler was found to be 2015-03-30 01:32:26, while the controller's current time was 2015-04-11 02:26:20. To eliminate the time difference factor, it is essential to determine that the service has reported a problem. Check the log of Cinder-schedule and find that the service is really down because of the bug. Fix the bug, and then restart the service whose status changes to up.
Cinder Debug-Cinder service status is down