解決centos 7.0運行docker出現核心報錯方法
目前我這裡docker是運行在centos 7.0系統裡,使用1.5版本docker,最近一台伺服器總是不定期死機,通過查看日誌發現屬於核心bug導致,報錯資訊如下
May1103:43:08ip-10-10-29-201kernel:BUG:softlockup-CPU
#4stuckfor22s![handler20:1542]
May1103:43:08ip-10-10-29-201kernel:Moduleslinked
in
:iptable_natnf_nat_ipv4iptable_filterip_tablesbinfmt_miscipmi_sivfatfatusb_storagempt3sasmpt2sasraid_
classscsi_transport_sasmptctlmptbasedell_rbutcp_diaginet_diagvethbridgestpllcdm_thin_pooldm_persistent_datadm_bio_prisondm_bufioloopdm_modopenvswitchvxl
anip_tunnelgrelibcrc32cxt_natipt_MASQUERADExt_addrtypenf_natxt_limitipt_REJECTnf_conntrack_ipv4nf_defrag_ipv4xt_multiportxt_conntracksgnf_conntrackipmi_de
vintfiTCO_wdtiTCO_vendor_supportdcdbascoretempkvm_intelkvmcrct10dif_pclmulcrc32_pclmulcrc32c_intelghash_clmulni_intelaesni_intellrwgf128mulglue_helperablk_
helpercryptdpcspkrsb_edacedac_coresesenclosureipmi_msghandlertg3wmiacpi_power_meterptppps_coremei_memeintblpc_ichmperfmfd_coreshpchpext4
May1103:43:08ip-10-10-29-201kernel:mbcachejbd2sr_modcdromsd_modcrc_t10difcrct10dif_commonmgag200syscopyareasysfillrectsysimgblti2c_algo_bitdrm_kms_helper
ttmahcidrmlibahcilibatai2c_coremegaraid_sas[lastunloaded:ip_tables]
May1103:43:08ip-10-10-29-201kernel:CPU:4PID:1542Comm:handler20Tainted:GW--------------3.10.0-123.el7.x86_64
#1
May1103:43:08ip-10-10-29-201kernel:Hardwarename:DellInc.PowerEdgeR720
/0X6FFV
,BIOS1.6.003
/07/2013
May1103:43:08ip-10-10-29-201kernel:task:ffff880418adf1c0ti:ffff8800c8d08000task.ti:ffff8800c8d08000
May1103:43:08ip-10-10-29-201kernel:RIP:0010:[<ffffffff815e90e7>][<ffffffff815e90e7>]_raw_spin_lock+0x37
/0x50
May1103:43:08ip-10-10-29-201kernel:RSP:0018:ffff88041fc43ac8EFLAGS:00000206
May1103:43:08ip-10-10-29-201kernel:RAX:000000000000108bRBX:0000000000000000RCX:0000000000000000
May1103:43:08ip-10-10-29-201kernel:RDX:0000000000000002RSI:0000000000000002RDI:ffff88081609c318
May1103:43:08ip-10-10-29-201kernel:RBP:ffff88041fc43ac8R08:ffff8801049856d8R09:ffff88041fc43a00
May1103:43:08ip-10-10-29-201kernel:R10:0000000000000000R11:00000000e1bec8f9R12:ffff88041fc43a38
May1103:43:08ip-10-10-29-201kernel:R13:ffffffff815f2d9dR14:ffff88041fc43ac8R15:ffff88081609c300
May1103:43:08ip-10-10-29-201kernel:FS:00007fb082b8b700(0000)GS:ffff88041fc40000(0000)knlGS:0000000000000000
May1103:43:08ip-10-10-29-201kernel:CS:0010DS:0000ES:0000CR0:0000000080050033
May1103:43:08ip-10-10-29-201kernel:CR2:00007f2a743e6000CR3:00000008183c9000CR4:00000000000407e0
May1103:43:08ip-10-10-29-201kernel:DR0:0000000000000000DR1:0000000000000000DR2:0000000000000000
May1103:43:08ip-10-10-29-201kernel:DR3:0000000000000000DR6:00000000ffff0ff0DR7:0000000000000400
May1103:43:08ip-10-10-29-201kernel:Stack:
May1103:43:08ip-10-10-29-201kernel:ffff88041fc43af8ffffffffa042429fffff88003714be00ffffe8fbefc41540
May1103:43:08ip-10-10-29-201kernel:ffff880419070e80ffff88041fc43b30ffff88041fc43be0ffffffffa04239a4
May1103:43:08ip-10-10-29-201kernel:00000001b9ec8070ffff88003714be00ffff88041fc43b280000000000000246
May1103:43:08ip-10-10-29-201kernel:CallTrace:
May1103:43:08ip-10-10-29-201kernel:<IRQ>
May1103:43:08ip-10-10-29-201kernel:
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa042429f>]ovs_flow_stats_update+0x4f
/0xd0
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa04239a4>]ovs_dp_process_received_packet+0x84
/0x120
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa042a01a>]ovs_vport_receive+0x2a
/0x30
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa042b4cd>]vxlan_rcv+0x6d
/0x90
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa037b228>]vxlan_udp_encap_recv+0xb8
/0x130
[vxlan]
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81538bc2>]udp_queue_rcv_skb+0x162
/0x3d0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815394bd>]__udp4_lib_rcv+0x19d
/0x690
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815094d0>]?ip_rcv_finish+0x350
/0x350
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815399ca>]udp_rcv+0x1a
/0x20
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81509584>]ip_local_deliver_finish+0xb4
/0x1f0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81509858>]ip_local_deliver+0x48
/0x80
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815091fd>]ip_rcv_finish+0x7d
/0x350
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81509ac4>]ip_rcv+0x234
/0x380
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814cfdb6>]__netif_receive_skb_core+0x676
/0x870
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814cffc8>]__netif_receive_skb+0x18
/0x60
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814d0b7e>]process_backlog+0xae
/0x180
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814d041a>]net_rx_action+0x15a
/0x250
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81067047>]__do_softirq+0xf7
/0x290
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815f3a5c>]call_softirq+0x1c
/0x30
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81014d25>]do_softirq+0x55
/0x90
May1103:43:08ip-10-10-29-201kernel:[<ffffffff810673e5>]irq_exit+0x115
/0x120
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815f4358>]do_IRQ+0x58
/0xf0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815e94ad>]common_interrupt+0x6d
/0x6d
May1103:43:08ip-10-10-29-201kernel:<EOI>
May1103:43:08ip-10-10-29-201kernel:
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa0424465>]?ovs_flow_stats_get+0x145
/0x180
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa0424453>]?ovs_flow_stats_get+0x133
/0x180
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa04217b7>]ovs_flow_cmd_fill_info+0x1c7
/0x320
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa0421c5c>]ovs_flow_cmd_build_info.constprop.25+0x6c
/0xa0
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffffa0422155>]ovs_flow_cmd_new_or_set+0x4c5
/0x520
[openvswitch]
May1103:43:08ip-10-10-29-201kernel:[<ffffffff8108ec58>]?__wake_up_common+0x58
/0x90
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814ffcd8>]genl_family_rcv_msg+0x258
/0x3d0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814ffe50>]?genl_family_rcv_msg+0x3d0
/0x3d0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814ffee1>]genl_rcv_msg+0x91
/0xd0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fdf99>]netlink_rcv_skb+0xa9
/0xc0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fe4c8>]genl_rcv+0x28
/0x40
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fd5bd>]netlink_unicast+0xed
/0x1b0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fd9a7>]netlink_sendmsg+0x327
/0x760
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fa874>]?netlink_rcv_wake+0x44
/0x60
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814fb92b>]?netlink_recvmsg+0x1cb
/0x3e0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814b79b0>]sock_sendmsg+0xb0
/0xf0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814b807f>]?sock_recvmsg+0xbf
/0x100
May1103:43:08ip-10-10-29-201kernel:[<ffffffff8109b23e>]?task_scan_min+0x3e
/0x60
May1103:43:08ip-10-10-29-201kernel:[<ffffffff815e908b>]?_raw_spin_unlock_bh+0x1b
/0x40
May1103:43:08ip-10-10-29-201kernel:[<ffffffff814b7de9>]___sys_sendmsg+0x3a9
/0x3c0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff811f7fa9>]?ep_scan_ready_list.isra.9+0x1b9
/0x1f0
May1103:43:08ip-10-10-29-201kernel:[<ffffffff811f8123>]?ep_poll+0x123
/0x370
May1103:43:08ip-10-10-29-201kernel:[<ffffffff81079af3>]?getrusage+0x43
/0x70
May1103:43:09ip-10-10-29-201kernel:[<ffffffff814b8cd1>]__sys_sendmsg+0x51
/0x90
May1103:43:09ip-10-10-29-201kernel:[<ffffffff814b8d22>]SyS_sendmsg+0x12
/0x20
May1103:43:09ip-10-10-29-201kernel:[<ffffffff815f2119>]system_call_fastpath+0x16
/0x1b
May1103:43:09ip-10-10-29-201kernel:Code:0200f00fc10789c2c1ea106639c275025dc383e2fe0fb7f2b800800000eb0c0f1f440000f39083e8017
40a<0f>b70f6639ca75f15dc366666690666690ebda660f
通過在stackoverflow查詢發現此問題屬於核心bug,解決方案是升級核心。
下面是把centos 7.0預設3.10版本核心升級為4.0.2版本過程
1、匯入yum源的認證key
rpm--
import
https:
//www
.elrepo.org
/RPM-GPG-KEY-elrepo
.org
2、安裝yum源
rpm-Uvhhttp:
//www
.elrepo.org
/elrepo-release-7
.0-2.el7.elrepo.noarch.rpm
3、安裝新核心
在yum的ELRepo源中,有mainline(4.0.2)這個核心版本
[root@ip-10-10-29-201~]
#yum--enablerepo=elrepo-kernelinstallkernel-ml-develkernel-ml
Loadedplugins:fastestmirror
MooseFS|951B00:00:00
base|3.6kB00:00:00
elrepo|2.9kB00:00:00
elrepo-kernel|2.9kB00:00:00
extras|3.4kB00:00:00
updates|3.4kB00:00:00
(1
/2
):elrepo
/primary_db
|233kB00:00:02
(2
/2
):elrepo-kernel
/primary_db
|782kB00:00:04
MooseFS
/primary
|4.2kB00:00:00
Loadingmirrorspeedsfromcachedhostfile
*base:mirrors.yun-idc.com
*elrepo:repos.lax-noc.com
*elrepo-kernel:repos.lax-noc.com
*extras:mirror.bit.edu.cn
*updates:mirror.bit.edu.cn
MooseFS30
/30
ResolvingDependencies
-->Runningtransactioncheck
--->Packagekernel-ml.x86_640:4.0.2-1.el7.elrepowillbeinstalled
--->Packagekernel-ml-devel.x86_640:4.0.2-1.el7.elrepowillbeinstalled
-->FinishedDependencyResolution
DependenciesResolved
==========================================================================================================================================================================
PackageArchVersionRepositorySize
==========================================================================================================================================================================
Installing:
kernel-mlx86_644.0.2-1.el7.elrepoelrepo-kernel36M
kernel-ml-develx86_644.0.2-1.el7.elrepoelrepo-kernel9.5M
TransactionSummary
==========================================================================================================================================================================
Install2Packages
Totaldownloadsize:45M
Installedsize:199M
Isthisok[y
/d/N
]:y
Downloadingpackages:
(1
/2
):kernel-ml-4.0.2-1.el7.elrepo.x86_64.rpm|36MB00:00:11
(2
/2
):kernel-ml-devel-4.0.2-1.el7.elrepo.x86_64.rpm|9.5MB00:00:31
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total1.5MB
/s
|45MB00:00:31
Runningtransactioncheck
Runningtransaction
test
Transaction
test
succeeded
Runningtransaction
Warning:RPMDBalteredoutsideofyum.
Installing:kernel-ml-devel-4.0.2-1.el7.elrepo.x86_641
/2
Installing:kernel-ml-4.0.2-1.el7.elrepo.x86_642
/2
Verifying:kernel-ml-4.0.2-1.el7.elrepo.x86_641
/2
Verifying:kernel-ml-devel-4.0.2-1.el7.elrepo.x86_642
/2
Installed:
kernel-ml.x86_640:4.0.2-1.el7.elrepokernel-ml-devel.x86_640:4.0.2-1.el7.elrepo
Complete!
4、查看當前核心版本
[root@ip-10-10-29-201~]
#uname-r
3.10.0-123.el7.x86_64
重要:目前核心還是預設的版本,如果在這一步完成後你就直接reboot了,重啟後使用的核心版本還是預設的3.10,不會使用新的4.0.2,想修改啟動的順序,需要進行下一步
查看預設啟動順序
[root@ip-10-10-29-201~]
#awk-F\''$1=="menuentry"{print$2}'/etc/grub2.cfg
CentOSLinux(4.0.2-1.el7.elrepo.x86_64)7(Core)
CentOSLinux,withLinux3.10.0-123.el7.x86_64
CentOSLinux,withLinux0-rescue-18b184aa09434ecf9739a70c6b63638a
預設啟動的順序是從0開始,但我們新核心是從頭插入(目前位置在1,而4.0.2的是在0),所以需要選擇0,如果想生效最新的核心,需要
[root@ip-10-10-29-201~]
#grub2-set-default0
5、重啟
Reboot
6、重啟後查看核心
[root@ip-10-10-29-201conf]
#uname-r
4.0.2-1.el7.elrepo.x86_64
經過升級後,20天沒有出現此問題,所以判斷此次檔案為核心bug引起,通過升級核心解決。