The Heartbeat_check in oslo_messaging

Source: Internet
Author: User

Recently in the high-availability (three-control) test of the OpenStack control node, Nova Service-list saw that all Nova services were down when one of the control nodes was switched off. There are a number of such error messages in the Nova-compute log:

2016-11-08 03:46:23.887 127895 info oslo.messaging._drivers.impl_rabbit [-] a  recoverable connection/channel error occurred, trying to reconnect: [ Errno 32] broken pipe2016-11-08 03:46:27.275 127895 info oslo.messaging._ drivers.impl_rabbit [-] a recoverable connection/channel error occurred,  trying to reconnect: [errno 32] broken pipe2016-11-08 03:46:27.276  127895 info oslo.messaging._drivers.impl_rabbit [-] a recoverable connection/ channel error occurred, trying to reconnect: [errno 32] broken  pipe2016-11-08 03:46:27.276 127895 info oslo.messaging._drivers.impl_rabbit [-]  a recoverable connection/channel error occurred, trying to reconnect:  [Errno 32] broken pipe2016-11-08 03:46:27.277 127895 info oslo.messaging._drivers.impl_rabbit [ -] a recoverable connection/channel error occurred, trying to  reconnect: [errno 32] broken pipe2016-11-08 03:46:27.277 127895 info  oslo.messaging._drivers.impl_rabbit [-] a recoverable connection/channel error  occurred, trying to reconnect: [errno 32] broken pipe2016-11-08  03:46:27.278 127895 info oslo.messaging._drivers.impl_rabbit [-] a recoverable  CONNECTION/CHANNEL ERROR OCCURRED, TRYING TO RECONNECT: [ERRNO 32]  broken pipe2016-11-08 03:46:27.278 127895 info oslo.messaging._drivers.impl_ Rabbit [-] a recoverable connection/channel error occurred, trying to  reconnect: [errno 32] broken pipe 


The exception thrown above is located in the oslo_messaging/_drivers/impl_rabbit.py:

 def _heartbeat_thread_job (self):         "" "Thread that  maintains inactive connections         "" "         while not self._heartbeat_exit_event.is_set ():             with self._connection_lock.for_heartbeat ():                 recoverable_errors  =  (                     self.connection.recoverable_channel_errors +                      Self.connection.recoverable_connection_errors)                  try:                     try:                         self._heartbeat_check ()                          #  note (sileht): we need to drain event to receive                          # heartbeat from the broker but don ' t hold the                          # connection too much times. in amqpdriver a  connection                         # is used exclusivly for read or for write,  so we have                         # to do this for  connection used for write drain_events                         # already  do that for other connection                         try:                              self.connection.drain_events (timeout=0.001)                          except  socket.timeout:                             pass                     except  recoverable_errors as exc:                         log.info (_LI ("A recoverable  connection/channel error  "                                        "occurred, trying to reconnect: %s"),  exc                           Self.ensure_connection ()                                           except Exception:                     log.warning (_LW (" unexpected error during heartbeart  "                                       "Thread processing, retrying ...")         &nBsp;            log.debug (' Exception ',  exc_ Info=true)             self._heartbeat_exit_ Event.wait (                 Timeout=self._heartbeat_wait_timeout)         self._heartbeat_exit_ Event.clear ()

Originally heartbeat check is to detect whether the connection between the component service and RABBITMQ server is alive, the Heartbeat_check task in oslo_messaging runs in the background when the service is started, when a control node is closed , a RABBITMQ server node is actually closed. But this will always be in the loop, throwing the exception that recoverable_errors catches, and only if Self._heartbeat_exit_event.is_set () exits the while loop. Supposedly it should add a timeout, so that it will not be in the loop, after several minutes to recover.










This article is from the "The-way-to-cloud" blog, make sure to keep this source http://iceyao.blog.51cto.com/9426658/1870593

The Heartbeat_check in oslo_messaging

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.