[gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel

Lukas Hejtmanek xhejtman at ics.muni.cz
Wed Apr 15 18:06:57 BST 2020


Should I report then or just wait to fix 18.1 problem and see whether older
ones are gone as well?

On Wed, Apr 15, 2020 at 04:51:02PM +0000, Felipe Knop wrote:
>    Lukas,
>     
>    There was one particular kernel change introduced in 3.10.0-1062.18.1 that
>    has triggered a given set of crashes. It's possible, though, that there is
>    a lingering problem affecting older levels of 3.10.0-1062. I believe that
>    crashes occurring on older kernels should be treated as separate problems.
>     
>      Felipe
>     
>    ----
>    Felipe Knop knop at us.ibm.com
>    GPFS Development and Security
>    IBM Systems
>    IBM Building 008
>    2455 South Rd, Poughkeepsie, NY 12601
>    (845) 433-9314 T/L 293-9314
>     
>     
>     
> 
>      ----- Original message -----
>      From: Lukas Hejtmanek <xhejtman at ics.muni.cz>
>      Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>      Cc:
>      Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel crashes with Spectrum
>      Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel
>      Date: Wed, Apr 15, 2020 12:35 PM
>       
>      And are you sure it is present only in -1062.18.1.el7 kernel? I think it
>      is
>      present in all -1062.* kernels..
> 
>      On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote:
>      >    Laurence,
>      >     
>      >    The problem affects all the Scale releases / PTFs.
>      >     
>      >      Felipe
>      >     
>      >    ----
>      >    Felipe Knop knop at us.ibm.com
>      >    GPFS Development and Security
>      >    IBM Systems
>      >    IBM Building 008
>      >    2455 South Rd, Poughkeepsie, NY 12601
>      >    (845) 433-9314 T/L 293-9314
>      >     
>      >     
>      >     
>      >
>      >      ----- Original message -----
>      >      From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]"
>      >      <laurence.schuler at nasa.gov>
>      >      Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      >      To: gpfsug main discussion list
>      <gpfsug-discuss at spectrumscale.org>
>      >      Cc:
>      >      Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with
>      Spectrum
>      >      Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel
>      >      Date: Wed, Apr 15, 2020 12:10 PM
>      >       
>      >
>      >      Will this impact *any* version of Spectrum Scale?
>      >
>      >       
>      >
>      >      -Laurence
>      >
>      >       
>      >
>      >      From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of
>      Felipe
>      >      Knop <knop at us.ibm.com>
>      >      Reply-To: gpfsug main discussion list
>      <gpfsug-discuss at spectrumscale.org>
>      >      Date: Wednesday, April 15, 2020 at 11:30 AM
>      >      To: "gpfsug-discuss at spectrumscale.org"
>      >      <gpfsug-discuss at spectrumscale.org>
>      >      Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum
>      Scale
>      >      and RHEL 7.7 3.10.0-1062.18.1.el7 kernel
>      >
>      >       
>      >
>      >      All,
>      >
>      >       
>      >
>      >      A problem has been identified with Spectrum Scale when running on
>      RHEL
>      >      7.7 and kernel 3.10.0-1062.18.1.el7.  While a fix is being
>      currently
>      >      developed, customers should not move up to this kernel level.
>      >
>      >       
>      >
>      >      The new kernel was issued on March 17 via the following errata: 
>      >      [1][1]https://access.redhat.com/errata/RHSA-2020:0834 
>      >
>      >       
>      >
>      >      When this kernel is used with Scale, system crashes have been
>      observed.
>      >      The following are a couple of examples of kernel stack traces for
>      the
>      >      crash:
>      >
>      >       
>      >
>      >       
>      >
>      >      [ 2915.625015] BUG: unable to handle kernel NULL pointer
>      dereference at
>      >      0000000000000040
>      >      [ 2915.633770] IP: [<ffffffffc0e2cf90>]
>      >      cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux]
>      >
>      >      [ 2915.914097]  [<ffffffffc0e3d28c>] gpfs_i_rmdir+0x29c/0x310
>      >      [mmfslinux]
>      >      [ 2915.921381]  [<ffffffffb9663130>] ?
>      >      take_dentry_name_snapshot+0xf0/0xf0
>      >      [ 2915.928760]  [<ffffffffb9664f60>] ?
>      shrink_dcache_parent+0x60/0x90
>      >      [ 2915.935656]  [<ffffffffb96577cc>] vfs_rmdir+0xdc/0x150
>      >      [ 2915.941388]  [<ffffffffb965cca1>] do_rmdir+0x1f1/0x220
>      >      [ 2915.947119]  [<ffffffffb964ce66>] ? __fput+0x186/0x260
>      >      [ 2915.952849]  [<ffffffffb964d02e>] ? ____fput+0xe/0x10
>      >      [ 2915.958484]  [<ffffffffb94c2e60>] ? task_work_run+0xc0/0xe0
>      >      [ 2915.964701]  [<ffffffffb965df05>] SyS_unlinkat+0x25/0x40
>      >
>      >       
>      >
>      >      [1224278.495993] [<ffffffff88e63918>] __dentry_kill+0x128/0x190
>      >      [1224278.496678] [<ffffffff88e63a36>] dput+0xb6/0x1a0
>      >      [1224278.497378] [<ffffffff88e64116>] d_prune_aliases+0xb6/0xf0
>      >      [1224278.498083] [<ffffffffc0c2c0ea>]
>      cxiPruneDCacheEntry+0x13a/0x1c0
>      >      [mmfslinux]
>      >      [1224278.498798] [<ffffffffc0eba608>]
>      >      _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26]
>      >
>      >       
>      >
>      >       
>      >
>      >      RHEL 7.8 is also impacted by the same problem, but validation of
>      Scale
>      >      with 7.8 is still under way.
>      >
>      >       
>      >
>      >       
>      >
>      >        Felipe
>      >
>      >       
>      >
>      >      ----
>      >      Felipe Knop knop at us.ibm.com
>      >      GPFS Development and Security
>      >      IBM Systems
>      >      IBM Building 008
>      >      2455 South Rd, Poughkeepsie, NY 12601
>      >      (845) 433-9314 T/L 293-9314
>      >       
>      >
>      >       
>      >      _______________________________________________
>      >      gpfsug-discuss mailing list
>      >      gpfsug-discuss at spectrumscale.org
>      >      [2][2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>      >
>      >     
>      >
>      > References
>      >
>      >    Visible links
>      >    1. [3]https://access.redhat.com/errata/RHSA-2020:0834 
>      >    2. [4]http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
>      > _______________________________________________
>      > gpfsug-discuss mailing list
>      > gpfsug-discuss at spectrumscale.org
>      > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
>      --
>      Lukáš Hejtmánek
> 
>      Linux Administrator only because
>        Full Time Multitasking Ninja
>        is not an official job title
>      _______________________________________________
>      gpfsug-discuss mailing list
>      gpfsug-discuss at spectrumscale.org
>      [6]http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>       
> 
>     
> 
> References
> 
>    Visible links
>    1. https://access.redhat.com/errata/RHSA-2020:0834
>    2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    3. https://access.redhat.com/errata/RHSA-2020:0834
>    4. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    6. http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Lukáš Hejtmánek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title



More information about the gpfsug-discuss mailing list