[gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel
Lukas Hejtmanek
xhejtman at ics.muni.cz
Wed Apr 15 16:36:48 BST 2020
Hello,
I noticed this bug, it took about 10 minutes to crash.
However, I'm seeing similar NULL pointer dereference even with older kernels,
That dereference does not happen always in GPFS code, sometimes outside in NFS
or elsewhere, however it looks familiar. I have many crashdumps about this.
On Wed, Apr 15, 2020 at 03:29:53PM +0000, Felipe Knop wrote:
> All,
>
> A problem has been identified with Spectrum Scale when running on RHEL 7.7
> and kernel 3.10.0-1062.18.1.el7. While a fix is being currently
> developed, customers should not move up to this kernel level.
>
> The new kernel was issued on March 17 via the following errata:
> [1]https://access.redhat.com/errata/RHSA-2020:0834
>
> When this kernel is used with Scale, system crashes have been observed.
> The following are a couple of examples of kernel stack traces for the
> crash:
>
>
> [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000040
> [ 2915.633770] IP: [<ffffffffc0e2cf90>]
> cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux]
> [ 2915.914097] [<ffffffffc0e3d28c>] gpfs_i_rmdir+0x29c/0x310 [mmfslinux]
> [ 2915.921381] [<ffffffffb9663130>] ? take_dentry_name_snapshot+0xf0/0xf0
> [ 2915.928760] [<ffffffffb9664f60>] ? shrink_dcache_parent+0x60/0x90
> [ 2915.935656] [<ffffffffb96577cc>] vfs_rmdir+0xdc/0x150
> [ 2915.941388] [<ffffffffb965cca1>] do_rmdir+0x1f1/0x220
> [ 2915.947119] [<ffffffffb964ce66>] ? __fput+0x186/0x260
> [ 2915.952849] [<ffffffffb964d02e>] ? ____fput+0xe/0x10
> [ 2915.958484] [<ffffffffb94c2e60>] ? task_work_run+0xc0/0xe0
> [ 2915.964701] [<ffffffffb965df05>] SyS_unlinkat+0x25/0x40
>
> [1224278.495993] [<ffffffff88e63918>] __dentry_kill+0x128/0x190
> [1224278.496678] [<ffffffff88e63a36>] dput+0xb6/0x1a0
> [1224278.497378] [<ffffffff88e64116>] d_prune_aliases+0xb6/0xf0
> [1224278.498083] [<ffffffffc0c2c0ea>] cxiPruneDCacheEntry+0x13a/0x1c0
> [mmfslinux]
> [1224278.498798] [<ffffffffc0eba608>]
> _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26]
>
>
> RHEL 7.8 is also impacted by the same problem, but validation of Scale
> with 7.8 is still under way.
>
>
> Felipe
>
> ----
> Felipe Knop knop at us.ibm.com
> GPFS Development and Security
> IBM Systems
> IBM Building 008
> 2455 South Rd, Poughkeepsie, NY 12601
> (845) 433-9314 T/L 293-9314
>
>
> References
>
> Visible links
> 1. https://access.redhat.com/errata/RHSA-2020:0834
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Lukáš Hejtmánek
Linux Administrator only because
Full Time Multitasking Ninja
is not an official job title
More information about the gpfsug-discuss
mailing list