[gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

Ryan Novosielski novosirj at rutgers.edu
Wed Jan 15 21:10:59 GMT 2020


Hi there,

I know some of the Spectrum Scale developers look at this list. I’m having a little trouble with support on this problem. 

We are seeing crashes with GPFS 5.0.4-1 Data Access Edition on KVM guests with a portability layer that has been installed via gpfs.gplbin RPMs that we built at our site and have used to install GPFS all over our environment. We’ve not seen this problem so far on any physical hosts, but have now experienced it on guests running on number of our KVM hypervisors, across vendors and firmware versions, etc. At one time I thought it was all happening on systems using Mellanox virtual functions for Infiniband, but we’ve now seen it on VMs without VFs. There may be an SELinux interaction, but some of our hosts have it disabled outright, some are Permissive, and some were working successfully with 5.0.2.x GPFS. 

What I’ve been instructed to try to solve this problem has been to run “mmbuildgpl”, and it has solved the problem. I don’t consider running "mmbuildgpl" a real solution, however. If RPMs are a supported means of installation, it should work. Support told me that they’d seen this solve the problem at another site as well.

Does anyone have any more information about this problem/whether there’s a fix in the pipeline, or something that can be done to cause this problem that we could remedy? Is there an easy place to see a list of eFixes to see if this has come up? I know it’s very similar to a problem that happened I believe it was after 5.0.2.2 and Linux 3.10.0-957.19.1, but that was fixed already in 5.0.3.x.

Below is a sample of the crash output:

[  156.733477] kernel BUG at mm/slub.c:3772!
[  156.734212] invalid opcode: 0000 [#1] SMP
[  156.735017] Modules linked in: ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables mmfs26(OE) mmfslinux(OE) tracedev(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter iptable_security nfit libnvdimm ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper sg joydev pcspkr cryptd parport_pc parport i2c_piix4 virtio_balloon knem(OE) binfmt_misc ip_tables xfs libcrc32c mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sr_mod cdrom ata_generic pata_acpi virtio_console virtio_net virtio_blk crct10dif_pclmul crct10dif_common mlx5_core(OE) mlxfw(OE) crc32c_intel ptp pps_core devlink ata_piix serio_raw mlx_compat(OE) libata virtio_pci floppy virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[  156.754814] CPU: 3 PID: 11826 Comm: request_handle* Tainted: G           OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
[  156.756782] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[  156.757978] task: ffff8aeca5bf8000 ti: ffff8ae9f7a24000 task.ti: ffff8ae9f7a24000
[  156.759326] RIP: 0010:[<ffffffffbbe23dec>]  [<ffffffffbbe23dec>] kfree+0x13c/0x140
[  156.760749] RSP: 0018:ffff8ae9f7a27278  EFLAGS: 00010246
[  156.761717] RAX: 001fffff00000400 RBX: ffffffffbc6974bf RCX: ffffa74dc1bcfb60
[  156.763030] RDX: 001fffff00000000 RSI: ffff8aed90fc6500 RDI: ffffffffbc6974bf
[  156.764321] RBP: ffff8ae9f7a27290 R08: 0000000000000014 R09: 0000000000000003
[  156.765612] R10: 0000000000000048 R11: ffffdb5a82d125c0 R12: ffffa74dc4fd36c0
[  156.766938] R13: ffffffffc0a1c562 R14: ffff8ae9f7a272f8 R15: ffff8ae9f7a27938
[  156.768229] FS:  00007f8ffff05700(0000) GS:ffff8aedbfd80000(0000) knlGS:0000000000000000
[  156.769708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  156.770754] CR2: 000055963330e2b0 CR3: 0000000325ad2000 CR4: 00000000003606e0
[  156.772076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  156.773367] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  156.774663] Call Trace:
[  156.775154]  [<ffffffffc0a1c562>] cxiInitInodeSecurityCleanup+0x12/0x20 [mmfslinux]
[  156.776568]  [<ffffffffc0b50562>] _Z17newInodeInitLinuxP15KernelOperationP13gpfsVfsData_tPP8OpenFilePPvPP10gpfsNode_tP7FileUIDS6_N5LkObj12LockModeEnumE+0x152/0x290 [mmfs26]
[  156.779378]  [<ffffffffc0b5cdfa>] _Z9gpfsMkdirP13gpfsVfsData_tP15KernelOperationP9cxiNode_tPPvPS4_PyS5_PcjjjP10ext_cred_t+0x46a/0x7e0 [mmfs26]
[  156.781689]  [<ffffffffc0bdb928>] ? _ZN14BaseMutexClass15releaseLockHeldEP16KernelSynchState+0x18/0x130 [mmfs26]
[  156.783565]  [<ffffffffc0c3db2d>] _ZL21pcacheHandleCacheMissP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcPyP12pCacheResp_tPS5_PS4_PjSA_j+0x4bd/0x760 [mmfs26]
[  156.786228]  [<ffffffffc0c40675>] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1ff5/0x21a0 [mmfs26]
[  156.788681]  [<ffffffffc0c023ef>] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
[  156.790448]  [<ffffffffc0b6d59c>] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26]
[  156.793032]  [<ffffffffc0b8b022>] ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
[  156.794588]  [<ffffffffc0a36d96>] gpfs_i_lookup+0x2e6/0x5a0 [mmfslinux]
[  156.795838]  [<ffffffffc0b6cf40>] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6c0/0x6c0 [mmfs26]
[  156.797753]  [<ffffffffbbe65d52>] ? __d_alloc+0x122/0x180
[  156.798763]  [<ffffffffbbe65e10>] ? d_alloc+0x60/0x70
[  156.799700]  [<ffffffffbbe556d3>] lookup_real+0x23/0x60
[  156.800651]  [<ffffffffbbe560f2>] __lookup_hash+0x42/0x60
[  156.801675]  [<ffffffffbc377874>] lookup_slow+0x42/0xa7
[  156.802634]  [<ffffffffbbe5ac3f>] link_path_walk+0x80f/0x8b0
[  156.803666]  [<ffffffffbbe5ae4a>] path_lookupat+0x7a/0x8b0
[  156.804690]  [<ffffffffbbdcd2fe>] ? lru_cache_add+0xe/0x10
[  156.805690]  [<ffffffffbbe24ef5>] ? kmem_cache_alloc+0x35/0x1f0
[  156.806766]  [<ffffffffbbe5c45f>] ? getname_flags+0x4f/0x1a0
[  156.807817]  [<ffffffffbbe5b6ab>] filename_lookup+0x2b/0xc0
[  156.808834]  [<ffffffffbbe5d5f7>] user_path_at_empty+0x67/0xc0
[  156.809923]  [<ffffffffbbdf3ecd>] ? handle_mm_fault+0x39d/0x9b0
[  156.811017]  [<ffffffffbbe5d661>] user_path_at+0x11/0x20
[  156.811983]  [<ffffffffbbe50343>] vfs_fstatat+0x63/0xc0
[  156.812951]  [<ffffffffbbe506fe>] SYSC_newstat+0x2e/0x60
[  156.813931]  [<ffffffffbc388a26>] ? trace_do_page_fault+0x56/0x150
[  156.815050]  [<ffffffffbbe50bbe>] SyS_newstat+0xe/0x10
[  156.816010]  [<ffffffffbc38dede>] system_call_fastpath+0x25/0x2a
[  156.817104] Code: 49 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 68 4c 89 df e8 89 2f fa ff eb 84 4c 8b 58 30 48 8b 10 80 e6 80 4c 0f 44 d8 e9 28 ff ff ff <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
[  156.822192] RIP  [<ffffffffbbe23dec>] kfree+0x13c/0x140
[  156.823180]  RSP <ffff8ae9f7a27278>
[  156.823872] ---[ end trace 142960be4a4feed8 ]---
[  156.824806] Kernel panic - not syncing: Fatal exception
[  156.826475] Kernel Offset: 0x3ac00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'



More information about the gpfsug-discuss mailing list