linux系统报xfs_vm_releasepage警告问题的处理方法
问题说明
最近的几台机器在同一天的不同时段都出现以下警告信息:
Mar2620:55:03host1kernel:WARNING:atfs/xfs/xfs_aops.c:1045xfs_vm_releasepage+0xcb/0x100[xfs]() Mar2620:55:03host1kernel:Moduleslinkedin:nf_conntrack_ipv4nf_defrag_ipv4xt_conntracknf_conntrackiptable_filterip_tablesebtable_filterebtablesip6table_ filterip6_tablesdevlinkbridgestpllcxt_multiportsunrpcdm_mirrordm_region_hashdm_logdm_modintel_powerclampcoretempintel_rapliosf_mbikvm_intelkvmirqbypa sscrc32_pclmulghash_clmulni_intelaesni_intellrwgf128mulglue_helperablk_helpercryptdiTCO_wdtiTCO_vendor_supportdcdbasipmi_devintfipmi_sisgpcspkripmi_msg handlershpchpi2c_i801lpc_ichnfitlibnvdimmacpi_power_meterkgwttm(OE)xfslibcrc32csd_modcrc_t10difcrct10dif_genericcrct10dif_pclmulcrct10dif_commoncrc32c_i ntelmgag200drm_kms_helperigbsyscopyareasysfillrectsysimgbltptpfb_sys_fopsttmpps_coredcaahcidrmi2c_algo_bitlibahcimegaraid_sasi2c_corelibata Mar2620:55:03host1kernel:fjes[lastunloaded:nf_defrag_ipv4] Mar2620:55:03host1kernel:CPU:10PID:224Comm:kswapd0Tainted:GOE------------3.10.0-514.21.2.el7.x86_64#1 Mar2620:55:03host1kernel:Hardwarename:DellInc.PowerEdgeR640/0W23H8,BIOS1.3.702/08/2018 Mar2620:55:03host1kernel:000000000000000000000000e02a0d05ffff88103c7ebaa0ffffffff81687073 Mar2620:55:03host1kernel:ffff88103c7ebad8ffffffff81085cb0ffffea0000687620ffffea0000687600 Mar2620:55:03host1kernel:ffff88004a71daf8ffff88103c7ebda0ffffea0000687600ffff88103c7ebae8 Mar2620:55:03host1kernel:CallTrace: Mar2620:55:03host1kernel:[]dump_stack+0x19/0x1b Mar2620:55:03host1kernel:[ ]warn_slowpath_common+0x70/0xb0 Mar2620:55:03host1kernel:[ ]warn_slowpath_null+0x1a/0x20 Mar2620:55:03host1kernel:[ ]xfs_vm_releasepage+0xcb/0x100[xfs] Mar2620:55:03host1kernel:[ ]try_to_release_page+0x32/0x50 Mar2620:55:03host1kernel:[ ]shrink_active_list+0x3d6/0x3e0 Mar2620:55:03host1kernel:[ ]shrink_lruvec+0x3f1/0x770 Mar2620:55:03host1kernel:[ ]shrink_zone+0x76/0x1a0 Mar2620:55:03host1kernel:[ ]balance_pgdat+0x48c/0x5e0 Mar2620:55:03host1kernel:[ ]kswapd+0x173/0x450 Mar2620:55:03host1kernel:[ ]?wake_up_atomic_t+0x30/0x30 Mar2620:55:03host1kernel:[ ]?balance_pgdat+0x5e0/0x5e0 Mar2620:55:03host1kernel:[ ]kthread+0xcf/0xe0 Mar2620:55:03host1kernel:[ ]?kthread_create_on_node+0x140/0x140 Mar2620:55:03host1kernel:[ ]ret_from_fork+0x58/0x90 Mar2620:55:03host1kernel:[ ]?kthread_create_on_node+0x140/0x140 Mar2620:55:03host1kernel:---[endtrace24823c5c7a1ea2be]---
这几台机器的kernel及应用程序等崩溃信息由abrtd服务接管,可以通过abrt-cli查看概要信息:
#abrt-clilist--since1547518209 id2181dce8f72761585cb6a904dbff1806c1315c27 reason:WARNING:atfs/xfs/xfs_aops.c:1045xfs_vm_releasepage+0xcb/0x100[xfs]() time:Sat23Mar201908:30:45PMCST cmdline:BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.16.1.el7.x86_64root=/dev/sda1rocrashkernel=autonet.ifnames=0biosdevname=0 package:kernel uid:0(root) count:1 Directory:/var/spool/abrt/oops-2019-03-23-20:30:45-163925-0
内核版本如下:
Centos7
Linuxhost13.10.0-514.21.2.el7.x86_64
分析处理
红帽知识库
参考红帽知识库文档,xfs的这类警告信息在xfs模块遍历代码路径的时候会打印该信息,不影响主机使用.可升级内核到kernel-3.10.0-693.el7版本避免该警告信息,详细参见:redhat-access-2893711
RootCause:
Themessageswereinformationalandtheydonotaffectthesysteminanegativemanner.TheyareseenbecausetheXFSmoduleistraversingthroughXFScodepath.
代码分析
红帽知识库中并未提到内存回收的相关信息,不过从堆栈信息来看,像是因为内核回收内存而引起的,查看对应时间点的内存使用情况如下所示:
04:30:01PMkbmemfreekbmemused%memusedkbbufferskbcachedkbcommit%commitkbactivekbinactkbdirty ...... 08:40:01PM51394013097622099.618761046163802861058421.769243966034840920524 08:50:01PM47989613101026499.648761046664962855729221.729251387234804240400 09:00:01PM45594813103421299.658761046757122858885221.749241872434926132572 09:10:01PM55698013093318099.588761046103522855265621.719428721232983892900 #sysctlvm.min_free_kbytes vm.min_free_kbytes=90112
20:50到21:00之间的可用内存并没有增加,这意味着系统可能没有做内存回收操作,我们按照kernel日志的堆栈信息来看函数的调用关系:
shrink_active_list->try_to_release_page->xfs_vm_releasepage //source/mm/filemap.c 3225inttry_to_release_page(structpage*page,gfp_tgfp_mask) 3226{ 3227structaddress_space*constmapping=page->mapping; ...... 3233if(mapping&&mapping->a_ops->releasepage) 3234returnmapping->a_ops->releasepage(page,gfp_mask);xfs_vm_releasepage 3235returntry_to_free_buffers(page); 3236} //source/fs/xfs/xfs_aops.c 1034STATICint 1035xfs_vm_releasepage( 1036structpage*page, 1037gfp_tgfp_mask) 1038{ 1039intdelalloc,unwritten; 1040 1041trace_xfs_releasepage(page->mapping->host,page,0,0); 1042 1043xfs_count_page_state(page,&delalloc,&unwritten); 1044 1045if(WARN_ON_ONCE(delalloc)) 1046return0; 1047if(WARN_ON_ONCE(unwritten)) 1048return0; 1049 1050returntry_to_free_buffers(page); 1051} ...... 1827conststructaddress_space_operationsxfs_address_space_operations={ 1833.releasepage=xfs_vm_releasepage,
对应kernel日志kernel:WARNING:atfs/xfs/xfs_aops.c:1045即可看出源文件source/fs/xfs/xfs_aops.c的1045行打印出了该堆栈信息,实际上并没有执行try_to_free_buffers就已经返回:
1045if(WARN_ON_ONCE(delalloc)) 1046return0;
WARN_ON_ONCE则相对简单,在源文件source/include/asm-generic/bug.h即可找到:
73#define__WARN()warn_slowpath_null(__FILE__,__LINE__) 85#defineWARN_ON(condition)({\ ... 88__WARN();\ 136#defineWARN_ON_ONCE(condition)({\ .... 140if(unlikely(__ret_warn_once))\ 141if(WARN_ON(!__warned))\
__WARN函数则调用了堆栈信息里的warn_slowpath_null函数,进而调用warn_slowpath_common函数打印了堆栈信息:
//source/kernel/panic.c 517voidwarn_slowpath_null(constchar*file,intline) 518{ 519warn_slowpath_common(file,line,__builtin_return_address(0), 520TAINT_WARN,NULL); 521} 463staticvoidwarn_slowpath_common(constchar*file,intline,void*caller, 464unsignedtaint,structslowpath_args*args) 465{ 466disable_trace_on_warning(); 467 468printk(KERN_WARNING"------------[cuthere]------------\n"); 469printk(KERN_WARNING"WARNING:at%s:%d%pS()\n",file,line,caller); 470 471if(args) 472vprintk(args->fmt,args->args); ...... 485print_modules(); 486dump_stack(); 487print_oops_end_marker();
我们大致可以看出这个堆栈信息只是警告,和红帽知识库中描述的一致,并不影响主机的使用.
总结说明
从上面源文件的函数来看,只要kswapd内存回收的时候调用了xfs_vm_releasepage就有可能打印堆栈信息,如果打印堆栈则不会执行try_to_free_buffers操作,所以查看内存使用的时候可用内存并没有增加.如果不希望出现堆栈信息可以开启disable_trace_on_warning函数对应的kernel.traceoff_on_warning内核参数关闭堆栈提示,不过关闭后其他的内核信息也就不会再打印,所以从这方面来看只有升级内核版本才会避免出现这个信息.