linux系统报tcp_mark_head_lost错误的处理方法
问题说明
近期一台主机报以下kernel信息:
Jul810:47:42cztestkernel:------------[cuthere]------------ Jul810:47:42cztestkernel:WARNING:atnet/ipv4/tcp_input.c:2269tcp_mark_head_lost+0x113/0x290() Jul810:47:42cztestkernel:Moduleslinkedin:iptable_filterip_tablesbinfmt_misccdc_etherusbnetmiixt_multiportdm_mirrordm_region_hashdm_logdm_modintel_powerclampcoretempintel_rapliosf_mbikvm_intelkvmirqbypasscrc32_p clmulghash_clmulni_intelaesni_intellrwgf128mulglue_helperablk_helpercryptdipmi_ssifipmi_devintfipmi_simei_mepcspkriTCO_wdtmxm_wmiiTCO_vendor_supportdcdbasmeisgsb_edacedac_coreipmi_msghandlershpchplpc_ichwmiacpi_p ower_meterxfslibcrc32csd_modcrc_t10difcrct10dif_genericmgag200drm_kms_helpercrct10dif_pclmulcrct10dif_commonsyscopyareacrc32c_intelsysfillrectsysimgbltfb_sys_fopsigbttmptpdrmahcipps_corelibahcidcai2c_algo_bitlibat amegaraid_sasi2c_corefjes[lastunloaded:ip_tables] Jul810:47:42cztestkernel:CPU:10PID:0Comm:swapper/10Tainted:GW------------3.10.0-514.16.1.el7.x86_64#1 Jul810:47:42cztestkernel:Hardwarename:DellInc.PowerEdgeR630/02C2CP,BIOS2.3.411/08/2016 Jul810:47:42cztestkernel:0000000000000000dd79fe633eacd853ffff88103e743880ffffffff81686ac3 Jul810:47:42cztestkernel:ffff88103e7438b8ffffffff81085cb0ffff8806d5c57800ffff88010a4e6c80 Jul810:47:42cztestkernel:000000000000000100000000f90e778c0000000000000001ffff88103e7438c8 Jul810:47:42cztestkernel:CallTrace: Jul810:47:42cztestkernel:[ ]dump_stack+0x19/0x1b Jul810:47:42cztestkernel:[ ]warn_slowpath_common+0x70/0xb0 Jul810:47:42cztestkernel:[ ]warn_slowpath_null+0x1a/0x20 Jul810:47:42cztestkernel:[ ]tcp_mark_head_lost+0x113/0x290 Jul810:47:42cztestkernel:[ ]tcp_update_scoreboard+0x67/0x80 Jul810:47:42cztestkernel:[ ]tcp_fastretrans_alert+0x6dd/0xb50 Jul810:47:42cztestkernel:[ ]tcp_ack+0x8dd/0x12e0 Jul810:47:42cztestkernel:[ ]tcp_rcv_established+0x118/0x760 Jul810:47:42cztestkernel:[ ]tcp_v4_do_rcv+0x10a/0x340 Jul810:47:42cztestkernel:[ ]?security_sock_rcv_skb+0x16/0x20 Jul810:47:42cztestkernel:[ ]tcp_v4_rcv+0x799/0x9a0 Jul810:47:42cztestkernel:[ ]?iptable_filter_hook+0x36/0x80[iptable_filter] Jul810:47:42cztestkernel:[ ]ip_local_deliver_finish+0xb4/0x1f0 Jul810:47:42cztestkernel:[ ]ip_local_deliver+0x59/0xd0 Jul810:47:42cztestkernel:[ ]?ip_rcv_finish+0x350/0x350 Jul810:47:42cztestkernel:[ ]ip_rcv_finish+0x8a/0x350 Jul810:47:42cztestkernel:[ ]ip_rcv+0x2b6/0x410 Jul810:47:42cztestkernel:[ ]__netif_receive_skb_core+0x582/0x800 Jul810:47:42cztestkernel:[ ]?tcp4_gro_receive+0x134/0x1b0 Jul810:47:42cztestkernel:[ ]?__slab_free+0x81/0x2f0 Jul810:47:42cztestkernel:[ ]__netif_receive_skb+0x18/0x60 Jul810:47:42cztestkernel:[ ]netif_receive_skb_internal+0x40/0xc0 Jul810:47:42cztestkernel:[ ]napi_gro_receive+0xd8/0x130 Jul810:47:42cztestkernel:[ ]igb_clean_rx_irq+0x387/0x700[igb] Jul810:47:42cztestkernel:[ ]?skb_release_data+0xf2/0x140 Jul810:47:42cztestkernel:[ ]igb_poll+0x383/0x770[igb] Jul810:47:42cztestkernel:[ ]?tcp_write_timer_handler+0x200/0x200 Jul810:47:42cztestkernel:[ ]net_rx_action+0x170/0x380 Jul810:47:42cztestkernel:[ ]__do_softirq+0xef/0x280 Jul810:47:42cztestkernel:[ ]call_softirq+0x1c/0x30 Jul810:47:42cztestkernel:[ ]do_softirq+0x65/0xa0 Jul810:47:42cztestkernel:[ ]irq_exit+0x115/0x120 Jul810:47:42cztestkernel:[ ]do_IRQ+0x58/0xf0 Jul810:47:42cztestkernel:[ ]common_interrupt+0x6d/0x6d Jul810:47:42cztestkernel: [ ]?cpuidle_enter_state+0x52/0xc0 Jul810:47:42cztestkernel:[ ]cpuidle_idle_call+0xd9/0x210 Jul810:47:42cztestkernel:[ ]arch_cpu_idle+0xe/0x30 Jul810:47:42cztestkernel:[ ]cpu_startup_entry+0x245/0x290 Jul810:47:42cztestkernel:[ ]start_secondary+0x1ba/0x230 Jul810:47:42cztestkernel:---[endtrace6bc65b0c591c1794]---
主机环境如下:
System|DellInc.;PowerEdgeR620;
Platform|Linux
Kernel|Centos3.10.0-514.16.1.el7.x86_64
TotalMemory|64G
处理说明
堆栈的打印过程类似于xfs告警处理,大致的过程为内核开启sack,fack功能后,网络传输过程中需要的快速重传和选择性重传会通过tcp_input.c文件的tcp_mark_head_lost函数进行处理,其主要标记传输过程中丢失的报文的数量,如下所示,系统报的kernel堆栈信息由tcp_mark_head_lost函数中的tcp_verify_left_out函数调用触发:
//source/include/net/tcp.h #definetcp_verify_left_out(tp)WARN_ON(tcp_left_out(tp)>tp->packets_out) staticinlineunsignedinttcp_left_out(conststructtcp_sock*tp) { returntp->sacked_out+tp->lost_out; } //source/include/asm-generic/bug.h #define__WARN()warn_slowpath_null(__FILE__,__LINE__) #ifndefWARN_ON #defineWARN_ON(condition)({\ __WARN();\ }) #endif //source/net/ipv4/tcp_input.c /*Detectlossinevent"A"abovebymarkingheadofqueueupaslost. *ForFACKornon-SACK(Reno)senders,thefirst"packets"numberofsegments *areconsideredlost.ForRFC3517SACK,asegmentisconsideredlostifit *hasatleasttp->reorderingSACKedseqmentsaboveit;"packets"refersto *themaximumSACKedsegmentstopassbeforereachingthislimit. */ staticvoidtcp_mark_head_lost(structsock*sk,intpackets,intmark_head) { structtcp_sock*tp=tcp_sk(sk); .... tcp_verify_left_out(tp);//triggerdump_stack } ... staticvoidtcp_update_scoreboard(structsock*sk,intfast_rexmit) { structtcp_sock*tp=tcp_sk(sk); if(tcp_is_reno(tp)){ tcp_mark_head_lost(sk,1,1); }elseif(tcp_is_fack(tp)){ intlost=tp->fackets_out-tp->reordering; if(lost<=0) lost=1; tcp_mark_head_lost(sk,lost,0); }else{ intsacked_upto=tp->sacked_out-tp->reordering; if(sacked_upto>=0) tcp_mark_head_lost(sk,sacked_upto,0); elseif(fast_rexmit) tcp_mark_head_lost(sk,1,1); } }
从redhat-536483中描述的来看,这种错误信息一般是tcpbug引起的,在内核使用已经释放的tcpsocketbuffer链表的时候就可能触发:
RootCause
AuseafterfreeissuerelatedtotheTCPkernelsocketbufferlinkedlist.ThusitisabugintheTCPkernelcode.AlthoughthebugisinTCPkernelcode,butitcouldgettriggeredinmultipleways.ItcouldgettriggeredduetoNFS,orduetoevenanapplication(sayjavaprocess).
处理方式
升级kernel
如下所示,redhat在3.10.0-520版本可能修复了tcp_*相关函数的useafterfree相关的bug,可以尝试升级处理该问题:
centos7.xchangelog
*ThuNov032016RafaelAquini[3.10.0-520.el7] -[net]tcp:fixuseafterfreeintcp_xmit_retransmit_queue()(MateuszGuzik)[1379531]{CVE-2016-6828}
关闭fack/sack功能
从红帽知识库的文档来看,tcp_mark_head_lost函数主要用来标记快速重传和选择确认的过程中丢失的报文数量,所以或许可以临时关闭fack/sack参数避免该问题的出现:
sysctl-wnet.ipv4.tcp_fack=0 sysctl-wnet.ipv4.tcp_sack=0
可以优先尝试第二种方式,如果还有问题再考虑升级kernel版本.
参考
redhat-536483
bug-1367091
cve-2016-6828
kernel-commit