[gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel

Lukas Hejtmanek xhejtman at ics.muni.cz
Wed Apr 15 16:36:48 BST 2020


Hello,

I noticed this bug, it took about 10 minutes to crash.

However, I'm seeing similar NULL pointer dereference even with older kernels,
That dereference does not happen always in GPFS code, sometimes outside in NFS
or elsewhere, however it looks familiar. I have many crashdumps about this.

On Wed, Apr 15, 2020 at 03:29:53PM +0000, Felipe Knop wrote:
>    All,
>     
>    A problem has been identified with Spectrum Scale when running on RHEL 7.7
>    and kernel 3.10.0-1062.18.1.el7.  While a fix is being currently
>    developed, customers should not move up to this kernel level.
>     
>    The new kernel was issued on March 17 via the following errata: 
>    [1]https://access.redhat.com/errata/RHSA-2020:0834
>     
>    When this kernel is used with Scale, system crashes have been observed.
>    The following are a couple of examples of kernel stack traces for the
>    crash:
>     
>     
>    [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at
>    0000000000000040
>    [ 2915.633770] IP: [<ffffffffc0e2cf90>]
>    cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux]
>    [ 2915.914097]  [<ffffffffc0e3d28c>] gpfs_i_rmdir+0x29c/0x310 [mmfslinux]
>    [ 2915.921381]  [<ffffffffb9663130>] ? take_dentry_name_snapshot+0xf0/0xf0
>    [ 2915.928760]  [<ffffffffb9664f60>] ? shrink_dcache_parent+0x60/0x90
>    [ 2915.935656]  [<ffffffffb96577cc>] vfs_rmdir+0xdc/0x150
>    [ 2915.941388]  [<ffffffffb965cca1>] do_rmdir+0x1f1/0x220
>    [ 2915.947119]  [<ffffffffb964ce66>] ? __fput+0x186/0x260
>    [ 2915.952849]  [<ffffffffb964d02e>] ? ____fput+0xe/0x10
>    [ 2915.958484]  [<ffffffffb94c2e60>] ? task_work_run+0xc0/0xe0
>    [ 2915.964701]  [<ffffffffb965df05>] SyS_unlinkat+0x25/0x40
>     
>    [1224278.495993] [<ffffffff88e63918>] __dentry_kill+0x128/0x190
>    [1224278.496678] [<ffffffff88e63a36>] dput+0xb6/0x1a0
>    [1224278.497378] [<ffffffff88e64116>] d_prune_aliases+0xb6/0xf0
>    [1224278.498083] [<ffffffffc0c2c0ea>] cxiPruneDCacheEntry+0x13a/0x1c0
>    [mmfslinux]
>    [1224278.498798] [<ffffffffc0eba608>]
>    _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26]
>     
>     
>    RHEL 7.8 is also impacted by the same problem, but validation of Scale
>    with 7.8 is still under way.
>     
>     
>      Felipe
>     
>    ----
>    Felipe Knop knop at us.ibm.com
>    GPFS Development and Security
>    IBM Systems
>    IBM Building 008
>    2455 South Rd, Poughkeepsie, NY 12601
>    (845) 433-9314 T/L 293-9314
>     
> 
> References
> 
>    Visible links
>    1. https://access.redhat.com/errata/RHSA-2020:0834

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Lukáš Hejtmánek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title



More information about the gpfsug-discuss mailing list