Morning colleague reflects that the database is not available. Unable to log on to the host properly. After several attempts to finally board the host, check the system log found the following error:
Bug:soft lockup-cpu#5 stuck for 17163091988s!
It seems to be an operating system bug.
Here's more information:
# Uname-ra
Linux test-db01 2.6.32-200.13.1.el5uek #1 SMP Wed Jul 21:02:33 EDT x86_64 x86_64 x86_64 gnu/linux
Mysql> select version ();
+------------+
| Version () |
+------------+
| 5.5.24-log |
+------------+
1 row in Set (0.00 sec)
Dec 22:55:44 test-db01 kernel:call Trace:
Dec 22:55:44 test-db01 Kernel: Bug:soft lockup-cpu#5 stuck for 17163091988s! [mysqld:27243]
Dec 22:55:44 test-db01 kernel:modules linked in:autofs4 (u) i2c_dev (u) i2c_core (u) HIDP (u) rfcomm (u) l2cap H (U) rfkill (u) lockd (u) sunrpc (u) Nf_conntrack_netbios_ns (u) ipt_reject (u) nf_conntrack_ipv4 (u) nf_defrag_ipv4 State (U) nf_conntrack (u) xt_tcpudp (u) ip6_tables (u) x_tables (u) be2iscsi (u) rdma_cm (u) ib_cm (u) iw_cm (u) ib_core (u) ib_addr (u) iscsi_tcp (u) bnx2i (u) cnic (u) UiO (u) IPv6 (U) cxgb3i (u) libcxgbi (u) cxgb3 scsi (U) SCSI_TRANSPORT_ISCSI (u) video (u) output (u) SBS (U) SBSHC (u) parport_pc (u) LP (U) parport (u) joydev (u) ses (u) enclos The ure (u) bnx2 (u) Dcdbas (u) serio_raw (u) snd_seq_dummy (u) snd_seq_oss (u) snd_seq_midi_event (u) snd_seq (u) snd_seq_device ( u) snd_pcm_oss (u) snd_mixer_oss (u) snd_pcm (u) snd_timer (u) snd (u) soundcore (u) snd_page_alloc (u) itco_wdt Vendor_support (U) PCSPKR (u) usb_storage (u) shpchp (u) megaraid_sas (u) [Last Unloaded:ip_tables]
Dec 22:55:44 test-db01 kernel:cpu 5:
Dec 22:55:44 test-db01 kernel:modules linked in:autofs4 (u) i2c_dev (u) i2c_core (u) HIDP (u) rfcomm (u) l2cap H (U) rfkill (u) lockd (u) sunrpc (u) Nf_conntrack_netbios_ns (u) ipt_reject (u) nf_conntrack_ipv4 (u) nf_defrag_ipv4 State (U) nf_conntrack (u) xt_tcpudp (u) ip6_tables (u) x_tables (u) be2iscsi (u) rdma_cm (u) ib_cm (u) iw_cm (u) ib_core (u) ib_addr (u) iscsi_tcp (u) bnx2i (u) cnic (u) UiO (u) IPv6 (U) cxgb3i (u) libcxgbi (u) cxgb3 scsi (U) SCSI_TRANSPORT_ISCSI (u) video (u) output (u) SBS (U) SBSHC (u) parport_pc (u) LP (U) parport (u) joydev (u) ses (u) enclos The ure (u) bnx2 (u) Dcdbas (u) serio_raw (u) snd_seq_dummy (u) snd_seq_oss (u) snd_seq_midi_event (u) snd_seq (u) snd_seq_device ( u) snd_pcm_oss (u) snd_mixer_oss (u) snd_pcm (u) snd_timer (u) snd (u) soundcore (u) snd_page_alloc (u) itco_wdt Vendor_support (U) PCSPKR (u) usb_storage (u) shpchp (u) megaraid_sas (u) [Last Unloaded:ip_tables]
Dec 22:55:44 TEST-DB01kernel:pid:27243, comm:mysqld not tainted 2.6.32-200.13.1.el5uek #1 PowerEdge R710
Dec 22:55:44 test-db01 kernel:rip:0033:[<00000000008f95a3>] [<00000000008f95a3>] 0x8f95a3
Although the article refers to the problems encountered in Exadata x2-8. However, the operating system kernel and error phenomena described in the test environment are basically consistent.
Exadata x2-8 database servers running unbreakable Enterprise Kernel for Oracle Linux 2.6.32-100.23.1 that has been contin uously up for more than 208 days is susceptible to this problem. Unbreakable Enterprise Kernel for Oracle Linux 2.6.32-100.23.1 are the Linux Kernel provided with Exadata releases 11.2.2.2 .0 through 11.2.2.4.2, inclusive. Uptime May is determined by the Uptime (1) command.
There are two types of solutions;
1. Upgrade to a new version
Upgrade to Exadata 11.2.3.1.0 or later (Recommended).
2. Restart the operating system until 208 days before it runs.
Reboot database servers before uptime reaches 208 days
Now we can only try the second kind.
The problem is one:
DELL's PowerEdge R710 Machine, the legend of this model of the machine will be down in half a year.
It's a coincidence that two bugs appear on one machine.
Reference Documentation:
"MOS" Alert-exadata x2-8 systems affected by Linux bugs 14258279:scheduling clock overflows in 208 days [ID 1473825.1]
"MOS" Bug 14258279: [EXADATA] SOFT lockup-cpu#0 STUCK for 17163091968s!
Linux bug 14258279:scheduling clock overflows in 208 days