Operation: The Ceph cluster expands several nodes.
Anomaly: When the Ceph cluster synchronizes, the OSD process is always abnormally down (after a period of time data is synchronized).
Ceph Version: 9.2.1
Log:
July 25 09:25:57 ceph6 ceph-osd[26051]: 0> 2017-07-25 09:25:57.471502 7f46fe478700 -1 common/HeartbeatMap.cc: In function ' Bool ceph:: Heartbeatmap::_ch7 Month 25 09:25:57 ceph6 ceph-osd[26051]: common/HeartbeatMap.cc: 81: failed assert (0 == "Hit suicide timeout") July 25 09:25:57 ceph6 ceph-osd[26051]: ceph version 9.2.1 (752B6A3020C3DE74E07D2A8B4C5E48DAB5A6B6FD) July 25 09:25:57 ceph6 ceph-osd[26051]: 1: (Ceph::__ceph_assert_fail (char const *, char const*, int, char const*) +0x85) [0x7f47038330b5]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 2: (Ceph::heartbeatmap::_check (ceph::heartbeat_handle_d const *, char const*, long) +0x2d9) [0x7f47037728e9]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 3: (CEPH::heartbeatmap::is_healthy () +0xd6) [0x7f4703773126]7 Month 25 09:25:57 ceph6 ceph-osd[ 26051]: 4: (Ceph::heartbeatmap::check_touch_file () +0x2c) [0x7f47037738ec]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 5: (Cephcontextservicethread::entry () +0x15b) [ 0x7f470384f2bb]7 Moon 25 09:25:57 ceph6 ceph-osd[26051]: 6: (() +0x7dc5) [ 0x7f47018a6dc5]7 Moon 25 09:25:57 ceph6 ceph-osd[26051]: 7: (Clone () +0x6d) [ 0x7f470014f76d]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: note: a copy of the executable, or ' objdump -rds <executable> ' is needed To interpret this.7 Month 25 09:25:57 ceph6 ceph-osd[26051]: terminate called after throwing an instance of ' ceph::failedassertion ' July 25 09:25:57 ceph6 ceph-osd[26051]: *** caught signal (aborted) **7 month 25 09:25:57 ceph6 ceph-osd[26051]: in thread 7f46fe4787007 Month 25 09:25:57 ceph6 ceph-osd[26051]: ceph version 9.2.1 (752B6A3020C3DE74E07D2A8B4C5E48DAB5A6B6FD) July 25 09:25:57 ceph6 ceph-osd[26051]: 1: (() +0x7e6fe2) [0x7f470373dfe2]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 2: (() +0xf370) [0x7f47018ae370]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 3: (Gsignal () +0x37) [0x7f470008d1d7]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 4: (Abort () +0x148) [0x7f470008e8c8]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 5: (__gnu_cxx::__verbose_terminate_handler () +0x165) [0x7f47009919d5]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 6: (() +0x5e946) [0x7f470098f946]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 7: (() +0x5e973) [0x7f470098f973]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 8: (() +0x5eb9f) [0x7f470098fb9f]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 9: (Ceph::__ceph_assert_fail (Char const*, char const*, int, char const*) +0x27a) [0x7f47038332aa]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 10: (Ceph::heartbeatmap::_check (ceph::heartbeat_handle_d const*, char const*, Long) +0x2d9) [0x7f47037728e9]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 11: ( Ceph::heartbeatmap::is_healthy () +0xd6) [0x7f4703773126]7 Month 25 09:25:57 ceph6 ceph-osd[ 26051]: 12: (Ceph::heartbeatmap::check_touch_file () +0x2c) [0x7f47037738ec]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 13: (Cephcontextservicethread::entry () +0x15b) [ 0x7f470384f2bb]7 Month &NBSP;25&NBSP;09:25:57&Nbsp;ceph6 ceph-osd[26051]: 14: (() +0x7dc5) [0x7f47018a6dc5]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 15: (Clone () +0x6d) [0x7f470014f76d]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 2017-07-25 09:25:57.525027 7f46fe478700 -1 *** caught signal (aborted) **7 month 25 09:25:57 ceph6 ceph-osd[26051]: in thread 7f46fe4787007 Month 25 09:25:57 ceph6 ceph-osd[26051]: ceph version 9.2.1 (752B6A3020C3DE74E07D2A8B4C5E48DAB5A6B6FD) July 25 09:25:57 ceph6 ceph-osd[ 26051]: 1: (() +0x7e6fe2) [0x7f470373dfe2]7 Month 25 09:25:57 ceph6 ceph-osd[ 26051]: 2: (() +0xf370) [0x7f47018ae370]7 month 25 09:25:57 ceph6 ceph-osd[26051] : 3: (Gsignal () +0x37) [0x7f470008d1d7]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 4: (Abort () +0x148) [0x7f470008e8c8]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 5: (__gnu_cxx::__ Verbose_terminate_handler () +0x165) [0x7f47009919d5]7 Month 25 09:25:57 ceph6 ceph-osd[ 26051]: 6: (() +0x5e946) [0x7f470098f946]7 Month 25 09:25:57 ceph6 ceph-osd[26051 ]: 7: (() +0x5e973) [0x7f470098f973]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 8: (() +0x5eb9f) [0x7f470098fb9f]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 9: (Ceph::__ceph_assert_fail (char const*, char const*, int, char const*) +0x27a) [0x7f47038332aa]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 10: ( Ceph::heartbeatmap::_check (Ceph::heartbeat_handle_d const*, char const*, long) +0x2d9) [0x7f47037728e9]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 11: (ceph:: Heartbeatmap::is_healthy () +0xd6) [0x7F4703773126]7 Moon 25 09:25:57 ceph6 ceph-osd[26051]: 12: (ceph::heartbeatmap::check _touch_file () +0x2c) [0x7f47037738ec]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 13: (Cephcontextservicethread::entry () +0x15b) [0x7f470384f2bb]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 14: (() +0x7dc5) [0x7f47018a6dc5]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 15: (Clone () +0x6d) [0x7f470014f76d]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: note: a copy of the executable, or ' Objdump -rdS <executable> ' is needed to interpret this.7 month 25 09:25:57 ceph6 ceph-osd[26051]: 0> 2017-07-25 09:25:57.525027 7f46fe478700 -1 *** Caught signal (aborted) **7 month 25 09:25:57 ceph6 ceph-osd[26051]: In thread 7F46fe4787007 Month 25 09:25:57 ceph6 ceph-osd[26051]: ceph version 9.2.1 ( 752B6A3020C3DE74E07D2A8B4C5E48DAB5A6B6FD) July 25 09:25:57 ceph6 ceph-osd[26051]: 1: (() +0x7e6fe2) [0x7f470373dfe2]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 2: (() +0xf370) [0x7f47018ae370]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 3: (Gsignal () +0x37) [0x7f470008d1d7]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 4: ( Abort () +0x148) [0x7f470008e8c8]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 5: (__ Gnu_cxx::__verbose_terminate_handler () +0x165) [0x7f47009919d5]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 6: (() +0x5e946) [0x7f470098f946]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 7: (() +0x5e973) [0x7f470098f973]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 8: (() +0x5eb9f) [0x7f470098fb9f]7 month 25 09:25:57 ceph6 ceph-osd[26051]: 9: (ceph: : __ceph_assert_fail (char const*, char const*, int, char const*) +0x27a) [ 0x7f47038332aa]7 Moon 25 09:25:57 ceph6 ceph-osd[26051]: 10: (ceph::heartbeatmap::_ Check (Ceph::heartbeat_handle_d const*, char const*, long) +0x2d9) [0x7f47037728e9]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 11: (Ceph::heartbeatmap::is_healthy () +0xd6) [0x7f4703773126]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 12: (ceph:: Heartbeatmap::check_touch_file () +0x2c) [0x7f47037738ec]7 Month 25 09:25:57 ceph6 ceph-osd[ 26051]: 13: (Cephcontextservicethread::entry () +0x15b) [0x7f470384f2bb]7 Month 25 09:25:57 ceph6 ceph-osd[26051]: 14: (() +0x7dc5) [0x7f47018a6dc5]7 Month 25 09:25:57 &NBSP;CEPH6&NBSP;CEPH-OSD[26051]: 15: (Clone () +0x6d) [0x7f470014f76d]7 month 25 09:25:57 ceph6 ceph-osd[26051]: NOTE: a copy of the executable, or ' objdump -rds < Executable> ' is needed to interpret this.
When searching with the keyword "FAILED assert (0 = =" hit Suicide timeout "), many of the exceptions begin with the" common/heartbeatmap.cc: "Class.
Basically there is no "common/heartbeatmap.cc: bayi" beginning.
During the various attempts to find, Balabala a lot. Finally, it is found that the problem occurs when the new node is inconsistent with the old node kernel version . Restore to normal after the update.
This article is from the "Tofu Blog" blog, make sure to keep this source http://407711169.blog.51cto.com/6616996/1959914
Ceph synchronization Data process OSD process abnormal exit record