From abeattie at au1.ibm.com Sat Jun 1 11:11:42 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 1 Jun 2019 10:11:42 +0000 Subject: [gpfsug-discuss] Gateway role on a NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Jun 3 09:51:53 2019 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 3 Jun 2019 08:51:53 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Message-ID: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Jun 5 09:42:15 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 5 Jun 2019 10:42:15 +0200 Subject: [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=uUqyuk8-P-Ra6X6T7ReoLj3kWy-VUg53oU2RZpf8bbg&s=XCJDxns17Ixdyviy_nuN0pCJsTkAN6dxCU994sl33qo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Fri Jun 7 22:45:31 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 17:45:31 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Message-ID: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zmance at ucar.edu Fri Jun 7 22:51:13 2019 From: zmance at ucar.edu (Zachary Mance) Date: Fri, 7 Jun 2019 15:51:13 -0600 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jun 7 23:07:49 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 18:07:49 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ZcS98SBJVzdDsVcuu7KjSr64rfzEBaFDD86UkLkp8Vw&s=mjERh67H5DB6dfP0I1KES4-9Ku25AVoQxHoB5gArxR4&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Sat Jun 8 18:22:12 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sat, 8 Jun 2019 17:22:12 +0000 Subject: [gpfsug-discuss] Forcing an internal mount to complete Message-ID: I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? Not mounted on any other (local cluster) nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sun Jun 9 02:16:08 2019 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 8 Jun 2019 21:16:08 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: References: Message-ID: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger the internal mounts to release. Sent from my iPhone > On Jun 8, 2019, at 13:22, Oesterlin, Robert wrote: > > I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? > > Not mounted on any other (local cluster) nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 04:24:47 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sat, 8 Jun 2019 23:24:47 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Hey Bob, Ditto on what Aaron said, it sounds as if the last fs manager might need a nudge. Things can get weird when a filesystem isn't mounted anywhere but a manager is needed for an operation though, so I would keep an eye on the ras logs of the cluster manager during the kick just to make sure the management duty isn't bouncing (which in turn can cause waiters). -Jordan On Sat, Jun 8, 2019 at 9:16 PM Aaron Knister wrote: > Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger > the internal mounts to release. > > Sent from my iPhone > > On Jun 8, 2019, at 13:22, Oesterlin, Robert > wrote: > > I have a few file systems that are showing ?internal mount? on my NSD > servers, even though they are not mounted. I?d like to force them, without > have to restart GPFS on those nodes - any options? > > > > Not mounted on any other (local cluster) nodes. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jun 9 13:18:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 9 Jun 2019 12:18:39 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Thanks for the suggestions - as it turns out, it was the *remote* mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? on the local cluster, to change the default mount point. Why would GPFS care if it?s remote mounted? Oh - well? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 14:20:28 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sun, 9 Jun 2019 09:20:28 -0400 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: If there's any I/O going to the filesystem at all, GPFS has to keep it internally mounted on at least a few nodes such as the token managers and fs manager. I *believe* that holds true even for remote clusters, in that they still need to reach back to the "local" cluster when operating on the filesystem. I could be wrong about that though. On Sun, Jun 9, 2019, 09:06 Oesterlin, Robert wrote: > Thanks for the suggestions - as it turns out, it was the **remote** > mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? > on the local cluster, to change the default mount point. Why would GPFS > care if it?s remote mounted? > > > > Oh - well? > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sun Jun 9 14:38:29 2019 From: spectrumscale at kiranghag.com (KG) Date: Sun, 9 Jun 2019 19:08:29 +0530 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: > Zach, > > This appears to be affecting all Scale versions, including 5.0.2 -- but > only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not > impacted) > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for Zachary Mance ---06/07/2019 05:51:37 > PM---Which versions of Spectrum Scale versions are you referring]Zachary > Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions > are you referring to? 5.0.2-3? --------------------------- > > From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > > HPC Data Infrastructure Group / CISL / NCAR > > --------------------------------------------------------------------------------------------------------------- > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Sun Jun 9 18:32:24 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 09 Jun 2019 18:32:24 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 05:29:14 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:29:14 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6=09kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Scott, Currently, we are only aware of the problem with 3.10.0-957.21.2 . We are not yet aware of the same problems also affecting 3.10.0-957.12.1, but hope to find out more shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Scott Goldman To: gpfsug main discussion list Date: 06/09/2019 01:50 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org And to be clear.. There is a .12 version: 3.10.0-957.12.1.el7.x86_64 Did you mean the .12 version or the .21? Conveniently, the kernel numbers are easily proposed! Sent from my BlackBerry - the most secure mobile device From: spectrumscale at kiranghag.com Sent: June 9, 2019 2:38 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=fQfU5Pw8BtsrqD8JCFskfMdm8ZIGWtDY-gMtk_iljwU&s=vVEdtvFYxwXzh3n52YWo4_XJIh4IvWzRl3NaAkmA-9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Mon Jun 10 05:41:29 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:41:29 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another?week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=7I4gXXVtdbnAsgAcK0NWr4-5d-a1bRr4578aC1wKRMo&s=jFJmGOvjWTjDfI_vI2pHOOvqzPw5rWbtLvrZdTEDtCg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Mon Jun 10 06:02:19 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Mon, 10 Jun 2019 06:02:19 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: <3uok4eacuqj53g26epedg19j.1560142939257@emailhosting.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:24:52 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:24:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Hallo Felippe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:43:02 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:43:02 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From kraemerf at de.ibm.com Mon Jun 10 13:47:46 2019 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 10 Jun 2019 14:47:46 +0200 Subject: [gpfsug-discuss] *NEWS* - IBM Spectrum Scale Erasure Code Edition v5.0.3 Message-ID: FYI - What is IBM Spectrum Scale Erasure Code Edition, and why should I consider it? IBM Spectrum Scale Erasure Code Edition provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale on the customer?s own choice of commodity hardware with the added benefit of network-dispersed IBM Spectrum Scale RAID, and all of its features providing data protection, storage efficiency, and the ability to manage storage in hyperscale environments. SAS, NL-SAS, and NVMe drives are supported right now. IBM Spectrum Scale Erasure Code Edition supports 4 different erasure codes: 4+2P, 4+3P, 8+2P, and 8+3P in addition to 3 and 4 way replication. Choosing an erasure code involves considering several factors. IBM Spectrum Scale Erasure Code Edition more details see section 18 in the Scale FAQ on the web https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Each IBM Spectrum Scale Erasure Code Edition recovery group can have 4 - 32 storage nodes, and there can be up to 128 storage nodes in an IBM Spectrum Scale cluster using IBM Spectrum Scale Erasure Code Edition. For more information, see Planning for erasure code selection in the IBM Spectrum Scale Erasure Code Edition Knowledge Center. https://www.ibm.com/support/knowledgecenter/en/STXKQY_ECE_5.0.3/ibmspectrumscaleece503_welcome.html Minimum requirements for IBM Spectrum Scale Erasure Code Edition see: https://www.ibm.com/support/knowledgecenter/STXKQY_ECE_5.0.3/com.ibm.spectrum.scale.ece.v5r03.doc/b1lece_min_hwrequirements.htm The hardware and network precheck tools can be downloaded from the following links: Hardware precheck: https://github.com/IBM/SpectrumScale_ECE_OS_READINESS Network precheck: https://github.com/IBM/SpectrumScale_NETWORK_READINESS The network can be either Ethernet or InfiniBand, and must be at least 25 Gbps bandwidth, with an average latency of 1.0 msec or less between any two storage nodes. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 14:43:10 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 09:43:10 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6=09?= =?utf-8?q?kernel=093=2E10=2E0-957=2E21=2E2?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=yWFAPveNSlMNNB5WT9HWp-2gQFFcYeCEsQdME5UvoGw&s=xZFqiCTjE-2e_6gM6MkzBcALK0hp-3ZquA7bt2GIjt8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 11 13:27:46 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 11 Jun 2019 12:27:46 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue Jun 11 16:54:03 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 11 Jun 2019 11:54:03 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with=09RHEL7=2E6?= =?utf-8?b?CWtlcm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=NrtWuqEKU3u4gccYHay_zERd91aEy7i2xuokUigK6fU&s=ctyTZhprfx7BRmt6V2wvvXV5p6iROrbSnRZf9WlfaXs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jun 11 18:55:36 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 11 Jun 2019 17:55:36 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release In-Reply-To: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Message-ID: Hi Mark, I case you didn't see, Lenovo released DSS-G 2.3a today. From the release notes: - IBM Spectrum Scale RAID * updated release 5.0 to 5.0.2-PTF3-efix0.1 (5.0.2-3.0.1) * updated release 4.2 to 4.2.3-PTF14 (4.2.3-14) Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of marc.caubet at psi.ch [marc.caubet at psi.ch] Sent: 03 June 2019 09:51 To: gpfsug main discussion list Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Tue Jun 11 20:32:41 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:32:41 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> Message-ID: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is > not enough? or even mmfsadm saferdump verbs | grep > VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never > mind, I understand your explanation and it is not definitely a big > issue... it was, above all, a curiosity to understand if the > command was modified to get the same behavior as before, but in a > different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( > and "officially" why isn't mmdiag good enough - what is missing. As > you probably know, mmfsadm might cause crashes and deadlock from > time to time, this is why we're trying to provide "safe ways" to > get the required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but > in a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden > test commands were changed. And since mmfsadm is not externalized > none of them is documented ( and the help messages are not > consistent as well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | > skipio | noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt > err | breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] > verbs status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on > Feb 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 > secs, pid 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on > Oct 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | > break | qps | inject op cnt err | breakqperr | qperridx idx | > breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- From bbanister at jumptrading.com Tue Jun 11 20:37:52 2019 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jun 2019 19:37:52 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: This has been brocket for a long time... we too were checking that `mmfsadm test verbs status` reported that RDMA is working. We don't want nodes that are not using RDMA running in the cluster. We have decided to just look for the log entry like this: test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" /var/adm/ras/mmfs.log.latest)" == "1" ]] } Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Tuesday, June 11, 2019 2:33 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] verbs status not working in 5.0.2 [EXTERNAL EMAIL] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is not > enough? or even mmfsadm saferdump verbs | grep VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never mind, I > understand your explanation and it is not definitely a big issue... it > was, above all, a curiosity to understand if the command was modified > to get the same behavior as before, but in a different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( and > "officially" why isn't mmdiag good enough - what is missing. As you > probably know, mmfsadm might cause crashes and deadlock from time to > time, this is why we're trying to provide "safe ways" to get the > required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but in > a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden test > commands were changed. And since mmfsadm is not externalized none of > them is documented ( and the help messages are not consistent as > well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt err | > breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] verbs > status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on Feb > 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 secs, pid > 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on Oct > 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | break | > qps | inject op cnt err | breakqperr | qperridx idx | breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Jun 11 20:45:40 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:45:40 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- From kums at us.ibm.com Tue Jun 11 20:49:12 2019 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 11 Jun 2019 15:49:12 -0400 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk><83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch><83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch><812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: Hi, This issue is resolved in the latest 5.0.3.1 release. # mmfsadm dump version | grep Build Build branch "5.0.3.1 ". # mmfsadm test verbs status VERBS RDMA status: started Regards, -Kums From: Ryan Novosielski To: "gpfsug-discuss at spectrumscale.org" Date: 06/11/2019 03:46 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Tue Jun 11 20:50:49 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:50:49 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thank you, that's great news. Now we just have to wait for that to make it to the DSS-G release. :-| On 6/11/19 3:49 PM, Kumaran Rajaram wrote: > Hi, > > This issue is resolved in the latest 5.0.3.1 release. > > /# mmfsadm dump version | grep Build/ */Build/*/branch "5.0.3.1 > "./ > > /# mmfsadm test verbs status/ /VERBS RDMA status: started/ > > Regards, -Kums > > > > Inactive hide details for Ryan Novosielski ---06/11/2019 03:46:54 > PM--------BEGIN PGP SIGNED MESSAGE----- Hash: SHA1Ryan Novosielski > ---06/11/2019 03:46:54 PM--------BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > From: Ryan Novosielski To: > "gpfsug-discuss at spectrumscale.org" > Date: 06/11/2019 03:46 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working > in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ---------------------------------------------------------------------- - -- > > > > > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Thanks -- this was originally how Lenovo told us to check this, and > I came across `mmfsadm test verbs status` on my own. > > I'm thinking, though, isn't there some risk that if RDMA went down > somehow, that wouldn't be caught by your script? I can't say that > I normally see that as the failure mode (it's most often booting > up without), nor do I know what happens to `mmfsadm test verbs > status` if you pull a cable or something. > > On 6/11/19 3:37 PM, Bryan Banister wrote: >> This has been brocket for a long time... we too were checking >> that `mmfsadm test verbs status` reported that RDMA is working. >> We don't want nodes that are not using RDMA running in the >> cluster. >> >> We have decided to just look for the log entry like this: >> test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" >> /var/adm/ras/mmfs.log.latest)" == "1" ]] } >> >> Hope that helps, -Bryan > > - -- ____ || \\UTGERS, > |----------------------*O*------------------------ ||_// the State > | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. > Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | > Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP > SIGNATURE----- > > iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx > vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= > =kBtE -----END PGP SIGNATURE----- > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAGFAAKCRCZv6Bp0Ryx vhGoAKDHtV4vNboVxdfrp7DLLBKp6+m60QCfQJRvJ+xEoXgpDO2VBbSBu0bMDwM= =aOrz -----END PGP SIGNATURE----- From p.childs at qmul.ac.uk Wed Jun 12 09:50:29 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 12 Jun 2019 08:50:29 +0000 Subject: [gpfsug-discuss] Odd behavior using sudo for mmchconfig Message-ID: Yesterday, I updated updated some gpfs config using sudo /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 which looked to have worked fine, however later other machines started reported issues with permissions while running mmlsquota as a user, cannot open file `/var/mmfs/gen/mmfs.cfg.ls' for reading (Permission denied) cannot open file `/var/mmfs/gen/mmfs.cfg' for reading (Permission denied) this was corrected by run-running the command from the same machine within a root session. sudo -s /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=20000,maxStatCache=80000 /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 exit I suspecting an environment issue from within sudo caused the gpfs config to have its permissions to change, but I've done simular before with no bad effects, so I'm a little confused. We're looking at tightening up our security to reduce the need for need for root based password less access from none admin nodes, but I've never understood the expect requirements this is using setting correctly, and I periodically see issues with our root known_hosts files when we update our admin hosts and hence I often endup going around with 'mmdsh -N all echo ""' to clear the old entries, but I always find this less than ideal, and hence would prefer a better solution. Thanks for any ideas to get this right and avoid future issues. I'm more than happy to open a IBM ticket on this issue, but I feel community feed back might get me further to start with. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From spectrumscale at kiranghag.com Thu Jun 13 17:55:07 2019 From: spectrumscale at kiranghag.com (KG) Date: Thu, 13 Jun 2019 22:25:07 +0530 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: > Renar, > > With the change below, which is a retrofit of a change deployed in newer > kernels, an inconsistency has taken place between the GPFS kernel > portability layer and the kernel proper. A known result of that > inconsistency is a kernel crash. One known sequence leading to the crash > involves the mkdir() call. > > We are working on an official notification on the issue. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/11/2019 > 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl]"Grunenberg, > Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a > generic Problem in rhel or only a scale related. Are there a > > From: "Grunenberg, Renar" > To: gpfsug main discussion list > Date: 06/11/2019 08:28 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hallo Felipe, > can you explain is this a generic Problem in rhel or only a scale related. > Are there any cicumstance already available? We ask redhat but have no > points that this are know to them? > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 15:43 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Renar, > > Thanks. Of the changes below, it appears that > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > was the one that ended up triggering the problem. Our investigations now > show that RHEL kernels >= 3.10.0-957.19.1 are impacted. > > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/10/2019 > 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" > ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: > > From: "Grunenberg, Renar" <*Renar.Grunenberg at huk-coburg.de* > > > To: "'gpfsug-discuss at spectrumscale.org'" < > *gpfsug-discuss at spectrumscale.org* > > Date: 06/10/2019 08:43 AM > Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > Hallo Felipe, > > here are the change list: > RHBA-2019:1337 kernel bug fix update > > > Summary: > > Updated kernel packages that fix various bugs are now available for Red > Hat Enterprise Linux 7. > > The kernel packages contain the Linux kernel, the core of any Linux > operating system. > > This update fixes the following bugs: > > * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) > > * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with > SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server > should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked > delegations (BZ#1689811) > > * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx > mtip_init_cmd_header routine (BZ#1689929) > > * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) > > * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal > cards (Regression from 1584963) - Need to flush fb writes when rewinding > push buffer (BZ#1690761) > > * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel > client issue (BZ#1692266) > > * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan > trunk and header rewrite (BZ#1693110) > > * aio O_DIRECT writes to non-page-aligned file locations on ext4 can > result in the overlapped portion of the page containing zeros (BZ#1693561) > > * [HP WS 7.6 bug] Audio driver does not recognize multi function audio > jack microphone input (BZ#1693562) > > * XFS returns ENOSPC when using extent size hint with space still > available (BZ#1693796) > > * OVN requires IPv6 to be enabled (BZ#1694981) > > * breaks DMA API for non-GPL drivers (BZ#1695511) > > * ovl_create can return positive retval and crash the host (BZ#1696292) > > * ceph: append mode is broken for sync/direct write (BZ#1696595) > > * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL > (BZ#1697241) > > * Failed to load kpatch module after install the rpm package occasionally > on ppc64le (BZ#1697867) > > * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) > > * Resizing an online EXT4 filesystem on a loopback device hangs > (BZ#1698110) > > * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) > > * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable > to discover newly added VMware LSI Logic SAS virtual disks without a > reboot. (BZ#1699723) > > * kernel: zcrypt: fix specification exception on z196 at ap probe > (BZ#1700706) > > * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() > (BZ#1701293) > > * stime showed huge values related to wrong calculation of time deltas > (L3:) (BZ#1701743) > > * Kernel panic due to NULL pointer dereference at > sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using > hard-coded device (BZ#1701991) > > * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings > (BZ#1702282) > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > * Missing wakeup leaves task stuck waiting in blk_queue_enter() > (BZ#1702921) > > * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) > > * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) > > * md_clear flag missing from /proc/cpuinfo on late microcode update > (BZ#1712993) > > * MDS mitigations are not enabled after double microcode update > (BZ#1712998) > > * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 > __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) > > Users of kernel are advised to upgrade to these updated packages, which > fix these bugs. > > Full details and references: > > *https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2* > > > Revision History: > > Issue Date: 2019-06-04 > Updated: 2019-06-04 > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: *Renar.Grunenberg at huk-coburg.de* > Internet: *www.huk.de* > > ------------------------------ > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > *Von:* *gpfsug-discuss-bounces at spectrumscale.org* > [ > *mailto:gpfsug-discuss-bounces at spectrumscale.org* > ] *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 06:41 > *An:* gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > *Betreff:* Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Hi, > > Though we are still learning what workload results in the problem, it > appears that even minimal I/O on the file system may cause the OS to crash. > One pattern that we saw was 'mkdir'. There is a chance that the DR site was > not yet impacted because no I/O workload has been run there. In that case, > rolling back to the prior kernel level (one which has been tested before) > may be advisable. > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my > customer already upgraded their DR site. Is rollback advised]KG > ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR > site. Is rollback advised? They will be running from DR > > From: KG <*spectrumscale at kiranghag.com* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 06/09/2019 09:38 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > One of my customer already upgraded their DR site. > > Is rollback advised? They will be running from DR site for a day in > another week. > > On Sat, Jun 8, 2019, 03:37 Felipe Knop <*knop at us.ibm.com* > > wrote: > > Zach, > > This appears to be affecting all Scale versions, including > 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. > (3.10.0-957 is not impacted) > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of > Spectrum Scale versions are you referring to? 5.0.2-3? > --------------------------- > > From: Zachary Mance <*zmance at ucar.edu* > > To: gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with > RHEL7.6 kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > Which versions of Spectrum Scale versions are you referring > to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > HPC Data Infrastructure Group / CISL / NCAR > --------------------------------------------------------------------------------------------------------------- > > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > All, > > There have been reported issues > (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel > 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until > further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jun 13 20:25:16 2019 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 13 Jun 2019 15:25:16 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6_ke?= =?utf-8?b?cm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Kiran, If SELinux is disabled (SELinux mode set to 'disabled') then the crash should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 or stay at that level. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/13/2019 12:56 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl"Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there a From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop < knop at us.ibm.com> wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ruNEnNWRM7KKCMlL1L1FqB8Ivd1BJ06q9bTmFf91ers&s=ccj51O58apypgvaYh1EVyKuP6GiWRZRSg-z00jTT0UI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Fri Jun 14 00:15:09 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 13 Jun 2019 19:15:09 -0400 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: <27309.1560467709@turing-police> On Thu, 13 Jun 2019 15:25:16 -0400, "Felipe Knop" said: > If SELinux is disabled (SELinux mode set to 'disabled') then the crash > should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 > or stay at that level. Note that if you have any plans to re-enable SELinux in the future, you'll have to do a relabel, which could take a while if you have large filesystems with tens or hundreds of millions of inodes.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From cblack at nygenome.org Mon Jun 17 17:24:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:24:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance Message-ID: Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). What can we do to gracefully handle network maintenance reducing bandwidth in half? Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) Any other ideas or comments? Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. Best, Chris ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Jun 17 17:31:38 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:31:38 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: Hi Chris, I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. Regards, Alex On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: > Our network team sometimes needs to take down sections of our network for > maintenance. Our systems have dual paths thru pairs of switches, but often > the maintenance will take down one of the two paths leaving all our nsd > servers with half bandwidth. > > Some of our systems are transmitting at a higher rate than can be handled > by half network (2x40Gb hosts with tx of 50Gb+). > > What can we do to gracefully handle network maintenance reducing bandwidth > in half? > > Should we set maxMBpS for affected nodes to a lower value? (default on our > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > Any other ideas or comments? > > Our hope is that metadata operations are not affected much and users just > see jobs and processes read or write at a slower rate. > > > > Best, > > Chris > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:37:48 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:37:48 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi I would really look into QoS instead. -- Cheers > On 17 Jun 2019, at 19.33, Alex Chekholko wrote: > > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. > > Regards, > Alex > > >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: >> Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. >> >> Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). >> >> What can we do to gracefully handle network maintenance reducing bandwidth in half? >> >> Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >> >> Any other ideas or comments? >> >> Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. >> >> >> >> Best, >> >> Chris >> >> This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jun 17 17:38:47 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 17 Jun 2019 16:38:47 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From cblack at nygenome.org Mon Jun 17 17:47:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:47:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. Best, Chris ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From alex at calicolabs.com Mon Jun 17 17:51:27 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:51:27 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Hi all, My experience with MaxMBpS was in the other direction but it did make a difference. We had lots of spare network bandwith (that is, the network was not the bottleneck) and in the course of various GPFS tuning it also looked like the disks were not too busy, and the NSDs were not too busy, so bumping up the MaxMBpS improved performance and allowed GPFS to do more. Of course, this was many years ago on different GPFS version and hardware, but I think it would work in the other direction. It should also be very safe to try. Regards, Alex On Mon, Jun 17, 2019 at 9:47 AM Christopher Black wrote: > The man page indicates that maxMBpS can be used to "artificially limit how > much I/O one node can put on all of the disk servers", but it might not be > the best choice. Man page also says maxmbps is in the class of mmchconfig > changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) > and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Skylar Thompson" behalf of skylar2 at uw.edu> wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS > should > use its in-memory buffers for read prefetches and dirty writes. > > On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > > Hi Chris, > > > > I think the next thing to double-check is when the maxMBpS change > takes > > effect. You may need to restart the nsds. Otherwise I think your > plan is > > sound. > > > > Regards, > > Alex > > > > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black < > cblack at nygenome.org> > > wrote: > > > > > Our network team sometimes needs to take down sections of our > network for > > > maintenance. Our systems have dual paths thru pairs of switches, > but often > > > the maintenance will take down one of the two paths leaving all > our nsd > > > servers with half bandwidth. > > > > > > Some of our systems are transmitting at a higher rate than can be > handled > > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > > > What can we do to gracefully handle network maintenance reducing > bandwidth > > > in half? > > > > > > Should we set maxMBpS for affected nodes to a lower value? > (default on our > > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 > for 32Gbps?) > > > > > > Any other ideas or comments? > > > > > > Our hope is that metadata operations are not affected much and > users just > > > see jobs and processes read or write at a slower rate. > > > > > > > > > > > > Best, > > > > > > Chris > > > ------------------------------ > > > This message is for the recipient???s use only, and may contain > > > confidential, privileged or protected information. Any > unauthorized use or > > > dissemination of this communication is prohibited. If you received > this > > > message in error, please immediately notify the sender and destroy > all > > > copies of this message. The recipient should check this email and > any > > > attachments for the presence of viruses, as we accept no liability > for any > > > damage caused by any virus transmitted by this email. > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:54:04 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:54:04 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi Writing from phone so excuse the typos. Assuming you have a system pool (metadata) and some other pool/s you can set limits on maintenance class that you done already and on other class that would affect all the other ops. You can add those per node or nodeclass that can be matched to what part/s of network you are working with. Changes are online and immediate. And you can measure normal load just by having QoS activated and looking into the values for few days. Hope makes some sense the above. -- Cheers > On 17 Jun 2019, at 19.48, Christopher Black wrote: > > The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should > use its in-memory buffers for read prefetches and dirty writes. > >> On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: >> Hi Chris, >> >> I think the next thing to double-check is when the maxMBpS change takes >> effect. You may need to restart the nsds. Otherwise I think your plan is >> sound. >> >> Regards, >> Alex >> >> >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black >> wrote: >> >>> Our network team sometimes needs to take down sections of our network for >>> maintenance. Our systems have dual paths thru pairs of switches, but often >>> the maintenance will take down one of the two paths leaving all our nsd >>> servers with half bandwidth. >>> >>> Some of our systems are transmitting at a higher rate than can be handled >>> by half network (2x40Gb hosts with tx of 50Gb+). >>> >>> What can we do to gracefully handle network maintenance reducing bandwidth >>> in half? >>> >>> Should we set maxMBpS for affected nodes to a lower value? (default on our >>> ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >>> >>> Any other ideas or comments? >>> >>> Our hope is that metadata operations are not affected much and users just >>> see jobs and processes read or write at a slower rate. >>> >>> >>> >>> Best, >>> >>> Chris >>> ------------------------------ >>> This message is for the recipient???s use only, and may contain >>> confidential, privileged or protected information. Any unauthorized use or >>> dissemination of this communication is prohibited. If you received this >>> message in error, please immediately notify the sender and destroy all >>> copies of this message. The recipient should check this email and any >>> attachments for the presence of viruses, as we accept no liability for any >>> damage caused by any virus transmitted by this email. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= >>> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=zyyij5eDMGGtTC00mplr-3aAR3dbStZGhwocBYKIyUg&s=dlSFGfd_CW47EaNE-5X9tMCkmqZ8WayaLCGI1sTzpkA&e= > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jun 17 20:39:46 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 17 Jun 2019 15:39:46 -0400 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Please note that the maxmbps parameter of mmchconfig is not part of the QOS features of the mmchqos command. mmchqos can be used to precisely limit IOPs. You can even set different limits for NSD traffic originating at different nodes. However, use the "force" of QOS carefully! No doubt you can bring a system to a virtual standstill if you set the IOPS values incorrectly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Jun 18 20:30:53 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 18 Jun 2019 15:30:53 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Message-ID: All, With respect to the issues (including kernel crashes) on Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just been released: https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 (as described in the link above) A fix is now available in efix form for both 4.2.3 and 5.0.x . The fix should be included in the upcoming PTFs for 4.2.3 and 5.0.3. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Wed Jun 19 00:23:37 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 18 Jun 2019 23:23:37 +0000 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: Hi We are doing a underlying hardware change that will result in the Linux device file names changing for attached storage. Hence I need to reconfigure the NSDs to use the new Linux device names. What is the best way to do this ? Thanks in advance Regards, Rob Logie IT Specialist A/NZ GBS Ballarat CIC Office: +61-3-5339 7748| Mobile: +61-411-021-029| Tie-Line: 97748 E-mail: roblogie at au1.ibm.com | Lotus Notes: Rob Logie/Australia/IBM IBM Buiilding, BA02 129 Gear Avenue, Mount Helen, Vic, 3350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 19 01:32:40 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 18 Jun 2019 20:32:40 -0400 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD In-Reply-To: References: Message-ID: <11132.1560904360@turing-police> On Tue, 18 Jun 2019 23:23:37 -0000, "Rob Logie" said: > We are doing a underlying hardware change that will result in the Linux > device file names changing for attached storage. > Hence I need to reconfigure the NSDs to use the new Linux device names. The only time GPFS cares about the Linux device names is when you go to actually create an NSD. After that, it just romps through /dev, finds anything that looks like a disk, and if it has an NSD on it at the appropriate offset, claims it as a GPFS device. (Protip: Since in a cluster the same disk may not have enumerated to the same name on all NSD servers that have visibility to it, you're almost always better off initially doing an mmcreatnsd specifying only one server, and then using mmchnsd to add the other servers to the server list for it) Heck, even without hardware changes, there's no guarantee that the disks enumerate in the same order across reboots (especially if you have a petabyte of LUNs and 8 or 16 paths to each LUN, though it's possible to tell the multipath daemon to have stable names for the multipath devices) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Jun 19 11:22:51 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 19 Jun 2019 11:22:51 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: Message-ID: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From arc at b4restore.com Wed Jun 19 12:30:33 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 19 Jun 2019 11:30:33 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From knop at us.ibm.com Wed Jun 19 13:22:40 2019 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Jun 2019 08:22:40 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Andi, Thank you. At least from the point of view of the change in the kernel (RHBA-2019:1337) that triggered the compatibility break between the GPFS kernel module and the kernel, the GPFS efix should work with the newer kernel. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Andi Rhod Christiansen To: gpfsug main discussion list Date: 06/19/2019 07:42 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From INDULISB at uk.ibm.com Wed Jun 19 13:36:26 2019 From: INDULISB at uk.ibm.com (Indulis Bernsteins1) Date: Wed, 19 Jun 2019 13:36:26 +0100 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: You can also speed up the startup of Spectrum Scale (GPFS) by using the nsddevices exit to supplement or bypass the normal "scan all block devices" process by Spectrum Scale. Useful if you have lots of LUNs or other block devices which are not NSDs, or for multipath. Though later versions of Scale might have fixed the scan for multipath devices. Anyway, this is old but potentially useful https://mytravelingfamily.com/2009/03/03/making-gpfs-work-using-multipath-on-linux/ All the information, representations, statements, opinions and proposals in this document are correct and accurate to the best of our present knowledge but are not intended (and should not be taken) to be contractually binding unless and until they become the subject of separate, specific agreement between us. Any IBM Machines provided are subject to the Statements of Limited Warranty accompanying the applicable Machine. Any IBM Program Products provided are subject to their applicable license terms. Nothing herein, in whole or in part, shall be deemed to constitute a warranty. IBM products are subject to withdrawal from marketing and or service upon notice, and changes to product configurations, or follow-on products, may result in price changes. Any references in this document to "partner" or "partnership" do not constitute or imply a partnership in the sense of the Partnership Act 1890. IBM is not responsible for printing errors in this proposal that result in pricing or information inaccuracies. Regards, Indulis Bernsteins Systems Architect IBM New Generation Storage Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Thu Jun 20 23:18:01 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 20 Jun 2019 22:18:01 +0000 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: <0D7782FD-5594-4D9D-8B2B-B0BF22A4CB5F@oarc.rutgers.edu> Hi there, Been reading the documentation and wikis and such this afternoon, but could use some assistance from someone who is more well-versed in AFM and policy writing to confirm that what I?m looking to do is actually feasible. Is is possible to: 1) Have a policy that, generally, continuously prefetches a single fileset of an AFM cache (make sure those files are there whenever possible)? 2) Generally prefer not evict files from that fileset, unless it?s necessary, opting to evict other stuff first? It seems to me that one can do a prefetch on the fileset, but that future files will not be prefetched, requiring you to run this periodically. Additionally, by default, it would seem as if these files would frequently be evicted in the case where it becomes necessary if they are infrequently used. Would like to avoid too much churn on this but provide fast access to these files (it?s a software tree, not user files). Thanks in advance! I?d rather know that it?s possible before digging too deeply into the how. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From quixote at us.ibm.com Fri Jun 21 13:06:35 2019 From: quixote at us.ibm.com (Chris Kempin) Date: Fri, 21 Jun 2019 08:06:35 -0400 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: Ryan: 1) You will need to just regularly run a prefetch to bring over the latest files .. you could either just run it regularly on the cache ( probably using the --directory flag to scan the whole fileset for uncached files ) or, with a little bit of scripting, you might be able to drive the prefetch from home if you know what files have been created/changed by shipping over to the cache a list of files to prefetch and have something prefetch that list when it arrives. 2) As to eviction, just set afmEnableAutoEviction=no and don't evict. is there a storage constraint on the cache that would force you to evict? I was using AFM in a more interactive application, with many small files and performance was not an issue in terms of "fast" access to files, but things to consider What is the network latency between home and cache? How big are the files you are dealing with? If you have very large files, you may want multiple gateways so they can fetch in parallel. How much change is there in the files? How many new/changed files a day are we talking about? Are existing files fairly stable? Regards, Chris Chris Kempin IBM Cloud - Site Reliability Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Tue Jun 25 12:38:28 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 11:38:28 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello, I wonder if anyone has seen this... I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I've checked the NSDs via mmlsnsd and mmlsdisk commands and they are all 'ready' and 'up'. The multipaths to these NSDs are all fine too. Is there a way of finding out what 'access' (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access - 'mmnsdrediscover' returns nothing and run really fast (contrary to the statement 'This may take a while' when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 13:10:53 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 12:10:53 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Tue Jun 25 13:01:11 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Tue, 25 Jun 2019 14:01:11 +0200 Subject: [gpfsug-discuss] Charts Decks - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The chart decks of the user meeting along ISC are now available: https://spectrumscale.org/presentations/ Thanks to all speaker and participants. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: gpfsug main discussion list Date: 05/06/2019 10:44 Subject: [EXTERNAL] [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 Inactive hide details for "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=oSzGEkM6PXf5XfF3fAOrsCpqjyrt-ukWcaq3_Ldy_P4&s=GiOkq0F1T3eVSb1IeWaD7gKImm1gEVwhGaa1eIHDhD8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From son.truong at bristol.ac.uk Tue Jun 25 16:02:20 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 15:02:20 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello Renar, Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. I have double checked the multipath and devices and I can see these NSDs are available locally. How do I get GPFS to recognise this and server them out via 'localhost'? mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. Any more ideas? Regards, Son --------------------------- Message: 2 Date: Tue, 25 Jun 2019 12:10:53 +0000 From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Content-Type: text/plain; charset="utf-8" Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 89, Issue 26 ********************************************** From janfrode at tanso.net Tue Jun 25 18:13:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 25 Jun 2019 19:13:12 +0200 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: I?ve had a situation recently where mmnsddiscover didn?t help, but mmshutdown/mmstartup on that node did fix it. This was with v5.0.2-3 on ppc64le. -jf tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic > NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs > are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic > NSDs back to being served locally, but the other two are not behaving. I > have double checked the availability of these devices and their multipaths > but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give > you a colum like ?IO performed on node?. On NSD-Server you should see > localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. > 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, informieren Sie bitte sofort den Absender und vernichten > Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD > failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the > rescan-scsi-bus.sh command especially with the -r switch. Even though there > are no devices removed the script seems to interrupt currently working NSDs > and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked > the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and > ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular > node has to an NSD? And is there a command to force it to switch to local > access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to > the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research > Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Tue Jun 25 18:21:17 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 25 Jun 2019 13:21:17 -0400 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: It may depend on which state the NSDs are in with respect to the node in question. If from that node you use 'mmfsadm dump nsd | egrep "moved|error|broken" ' and see anything, that might be it. One or two of those states can be fixed by mmnsddiscover, the other(s) require a kick of mmfsd to get the NSDs back. I never remember which is which. -Jordan On Tue, Jun 25, 2019, 13:13 Jan-Frode Myklebust wrote: > I?ve had a situation recently where mmnsddiscover didn?t help, but > mmshutdown/mmstartup on that node did fix it. > > This was with v5.0.2-3 on ppc64le. > > > -jf > > tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > >> >> Hello Renar, >> >> Thanks for that command, very useful and I can now see the problematic >> NSDs are all served remotely. >> >> I have double checked the multipath and devices and I can see these NSDs >> are available locally. >> >> How do I get GPFS to recognise this and server them out via 'localhost'? >> >> mmnsddiscover -d seemed to have brought two of the four problematic >> NSDs back to being served locally, but the other two are not behaving. I >> have double checked the availability of these devices and their multipaths >> but everything on that side seems fine. >> >> Any more ideas? >> >> Regards, >> Son >> >> >> --------------------------- >> >> Message: 2 >> Date: Tue, 25 Jun 2019 12:10:53 +0000 >> From: "Grunenberg, Renar" >> To: "gpfsug-discuss at spectrumscale.org" >> >> Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to >> NSD failed with EIO, switching to access the disk remotely." >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Hallo Son, >> >> you can check the access to the nsd with mmlsdisk -m. This give >> you a colum like ?IO performed on node?. On NSD-Server you should see >> localhost, on nsd-client you see the hostig nsd-server per device. >> >> Regards Renar >> >> >> Renar Grunenberg >> Abteilung Informatik - Betrieb >> >> HUK-COBURG >> Bahnhofsplatz >> 96444 Coburg >> Telefon: 09561 96-44110 >> Telefax: 09561 96-44104 >> E-Mail: Renar.Grunenberg at huk-coburg.de >> Internet: www.huk.de >> ________________________________ >> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter >> Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. >> 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg >> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. >> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans >> Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. >> ________________________________ >> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte >> Informationen. >> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich >> erhalten haben, informieren Sie bitte sofort den Absender und vernichten >> Sie diese Nachricht. >> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht >> ist nicht gestattet. >> >> This information may contain confidential and/or privileged information. >> If you are not the intended recipient (or have received this information >> in error) please notify the sender immediately and destroy this information. >> Any unauthorized copying, disclosure or distribution of the material in >> this information is strictly forbidden. >> ________________________________ >> Von: gpfsug-discuss-bounces at spectrumscale.org < >> gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong >> Gesendet: Dienstag, 25. Juni 2019 13:38 >> An: gpfsug-discuss at spectrumscale.org >> Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD >> failed with EIO, switching to access the disk remotely." >> >> Hello, >> >> I wonder if anyone has seen this? I am (not) having fun with the >> rescan-scsi-bus.sh command especially with the -r switch. Even though there >> are no devices removed the script seems to interrupt currently working NSDs >> and these messages appear in the mmfs.logs: >> >> 2019-06-25_06:30:48.706+0100: [I] Connected to >> 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [N] Connecting to >> 2019-06-25_06:30:51.195+0100: [I] Connected to >> 2019-06-25_06:30:59.857+0100: [N] Connecting to >> 2019-06-25_06:30:59.863+0100: [I] Connected to >> 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> >> These messages appear roughly at the same time each day and I?ve checked >> the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and >> ?up?. The multipaths to these NSDs are all fine too. >> >> Is there a way of finding out what ?access? (local or remote) a >> particular node has to an NSD? And is there a command to force it to switch >> to local access ? ?mmnsdrediscover? returns nothing and run really fast >> (contrary to the statement ?This may take a while? when it runs)? >> >> Any ideas appreciated! >> >> Regards, >> Son >> >> Son V Truong - Senior Storage Administrator Advanced Computing Research >> Centre IT Services, University of Bristol >> Email: son.truong at bristol.ac.uk >> Tel: Mobile: +44 (0) 7732 257 232 >> Address: 31 Great George Street, Bristol, BS1 5QD >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html >> > >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 89, Issue 26 >> ********************************************** >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 20:05:01 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 19:05:01 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: <832868CF-82CE-457E-91C7-2488B5C03D74@huk-coburg.de> Hallo Son, Please put mmnsddiscover -a N all. Are all NSD?s had there Server stanza Definition? Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= > Am 25.06.2019 um 17:02 schrieb Son Truong : > > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Wed Jun 26 09:58:09 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 26 Jun 2019 10:58:09 +0200 Subject: [gpfsug-discuss] Meet-up in Buenos Aires Message-ID: IBM will host an ?IBM Spectrum Scale Meet-up? along IBM Technical University Buenos Aires. This is the first user meeting in South America. All sessions will be in Spanish. https://www.spectrumscale.org/event/spectrum-scale-meet-up-in-buenos-aires/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jun 26 10:17:28 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 26 Jun 2019 09:17:28 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.roth at de.ibm.com Wed Jun 26 15:48:34 2019 From: stefan.roth at de.ibm.com (Stefan Roth) Date: Wed, 26 Jun 2019 16:48:34 +0200 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Message-ID: Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. |-----------------+----------------------+------------------------------------------------------+-----------+> |Mit freundlichen | | | || |Gr??en / Kind | | | || |regards | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Stefan Roth | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Spectrum Scale | | | || |GUI Development | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Phone: |+49-7034-643-1362 | IBM Deutschland | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |E-Mail: |stefan.roth at de.ibm.com| Am Weiher 24 | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | 65451 Kelsterbach | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | Germany | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |IBM Deutschland | | | || |Research & | | | || |Development | | | || |GmbH / | | | || |Vorsitzender des | | | || |Aufsichtsrats: | | | || |Matthias Hartmann| | | || | | | | || |Gesch?ftsf?hrung:| | | || |Dirk Wittkopp | | | || |Sitz der | | | || |Gesellschaft: | | | || |B?blingen / | | | || |Registergericht: | | | || |Amtsgericht | | | || |Stuttgart, HRB | | | || |243294 | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| |sf-gpfs.psi.ch|sf-ems1.psi.ch|gui_refresh_task_failed|NODE |sf-ems1.psi.ch|WARNING |The following GUI refresh task(s) | | | | | | | |failed: | | | | | | | |HEALTH_TRIGGERED;HW_INVENTORY | |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=1hMcHf9tfLS9nVUCQfQf4fELpZIFL9TdA8K3SBitL-w&s=SKPyQNlbW1HgGUGioHZhTr9gNlqdqpAV2SVJew0oLX0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Fri Jun 28 08:25:24 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:25:24 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From alvise.dorigo at psi.ch Fri Jun 28 08:32:42 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:32:42 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, , <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF9B@MBX214.d.ethz.ch> ops, and I made a double mistake: Currently I've 5.0.2-1 (not -3) on my GL2, and in house we only have x86_64, so I definitely need to download specific rpm from somewhere if it is compatible with 5.0.2-1. Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Dorigo Alvise (PSI) [alvise.dorigo at psi.ch] Sent: Friday, June 28, 2019 9:25 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From abeattie at au1.ibm.com Sat Jun 1 11:11:42 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 1 Jun 2019 10:11:42 +0000 Subject: [gpfsug-discuss] Gateway role on a NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Jun 3 09:51:53 2019 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 3 Jun 2019 08:51:53 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Message-ID: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Jun 5 09:42:15 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 5 Jun 2019 10:42:15 +0200 Subject: [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=uUqyuk8-P-Ra6X6T7ReoLj3kWy-VUg53oU2RZpf8bbg&s=XCJDxns17Ixdyviy_nuN0pCJsTkAN6dxCU994sl33qo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Fri Jun 7 22:45:31 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 17:45:31 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Message-ID: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zmance at ucar.edu Fri Jun 7 22:51:13 2019 From: zmance at ucar.edu (Zachary Mance) Date: Fri, 7 Jun 2019 15:51:13 -0600 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jun 7 23:07:49 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 18:07:49 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ZcS98SBJVzdDsVcuu7KjSr64rfzEBaFDD86UkLkp8Vw&s=mjERh67H5DB6dfP0I1KES4-9Ku25AVoQxHoB5gArxR4&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Sat Jun 8 18:22:12 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sat, 8 Jun 2019 17:22:12 +0000 Subject: [gpfsug-discuss] Forcing an internal mount to complete Message-ID: I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? Not mounted on any other (local cluster) nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sun Jun 9 02:16:08 2019 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 8 Jun 2019 21:16:08 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: References: Message-ID: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger the internal mounts to release. Sent from my iPhone > On Jun 8, 2019, at 13:22, Oesterlin, Robert wrote: > > I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? > > Not mounted on any other (local cluster) nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 04:24:47 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sat, 8 Jun 2019 23:24:47 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Hey Bob, Ditto on what Aaron said, it sounds as if the last fs manager might need a nudge. Things can get weird when a filesystem isn't mounted anywhere but a manager is needed for an operation though, so I would keep an eye on the ras logs of the cluster manager during the kick just to make sure the management duty isn't bouncing (which in turn can cause waiters). -Jordan On Sat, Jun 8, 2019 at 9:16 PM Aaron Knister wrote: > Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger > the internal mounts to release. > > Sent from my iPhone > > On Jun 8, 2019, at 13:22, Oesterlin, Robert > wrote: > > I have a few file systems that are showing ?internal mount? on my NSD > servers, even though they are not mounted. I?d like to force them, without > have to restart GPFS on those nodes - any options? > > > > Not mounted on any other (local cluster) nodes. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jun 9 13:18:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 9 Jun 2019 12:18:39 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Thanks for the suggestions - as it turns out, it was the *remote* mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? on the local cluster, to change the default mount point. Why would GPFS care if it?s remote mounted? Oh - well? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 14:20:28 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sun, 9 Jun 2019 09:20:28 -0400 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: If there's any I/O going to the filesystem at all, GPFS has to keep it internally mounted on at least a few nodes such as the token managers and fs manager. I *believe* that holds true even for remote clusters, in that they still need to reach back to the "local" cluster when operating on the filesystem. I could be wrong about that though. On Sun, Jun 9, 2019, 09:06 Oesterlin, Robert wrote: > Thanks for the suggestions - as it turns out, it was the **remote** > mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? > on the local cluster, to change the default mount point. Why would GPFS > care if it?s remote mounted? > > > > Oh - well? > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sun Jun 9 14:38:29 2019 From: spectrumscale at kiranghag.com (KG) Date: Sun, 9 Jun 2019 19:08:29 +0530 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: > Zach, > > This appears to be affecting all Scale versions, including 5.0.2 -- but > only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not > impacted) > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for Zachary Mance ---06/07/2019 05:51:37 > PM---Which versions of Spectrum Scale versions are you referring]Zachary > Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions > are you referring to? 5.0.2-3? --------------------------- > > From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > > HPC Data Infrastructure Group / CISL / NCAR > > --------------------------------------------------------------------------------------------------------------- > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Sun Jun 9 18:32:24 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 09 Jun 2019 18:32:24 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 05:29:14 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:29:14 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6=09kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Scott, Currently, we are only aware of the problem with 3.10.0-957.21.2 . We are not yet aware of the same problems also affecting 3.10.0-957.12.1, but hope to find out more shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Scott Goldman To: gpfsug main discussion list Date: 06/09/2019 01:50 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org And to be clear.. There is a .12 version: 3.10.0-957.12.1.el7.x86_64 Did you mean the .12 version or the .21? Conveniently, the kernel numbers are easily proposed! Sent from my BlackBerry - the most secure mobile device From: spectrumscale at kiranghag.com Sent: June 9, 2019 2:38 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=fQfU5Pw8BtsrqD8JCFskfMdm8ZIGWtDY-gMtk_iljwU&s=vVEdtvFYxwXzh3n52YWo4_XJIh4IvWzRl3NaAkmA-9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Mon Jun 10 05:41:29 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:41:29 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another?week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=7I4gXXVtdbnAsgAcK0NWr4-5d-a1bRr4578aC1wKRMo&s=jFJmGOvjWTjDfI_vI2pHOOvqzPw5rWbtLvrZdTEDtCg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Mon Jun 10 06:02:19 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Mon, 10 Jun 2019 06:02:19 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: <3uok4eacuqj53g26epedg19j.1560142939257@emailhosting.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:24:52 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:24:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Hallo Felippe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:43:02 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:43:02 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From kraemerf at de.ibm.com Mon Jun 10 13:47:46 2019 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 10 Jun 2019 14:47:46 +0200 Subject: [gpfsug-discuss] *NEWS* - IBM Spectrum Scale Erasure Code Edition v5.0.3 Message-ID: FYI - What is IBM Spectrum Scale Erasure Code Edition, and why should I consider it? IBM Spectrum Scale Erasure Code Edition provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale on the customer?s own choice of commodity hardware with the added benefit of network-dispersed IBM Spectrum Scale RAID, and all of its features providing data protection, storage efficiency, and the ability to manage storage in hyperscale environments. SAS, NL-SAS, and NVMe drives are supported right now. IBM Spectrum Scale Erasure Code Edition supports 4 different erasure codes: 4+2P, 4+3P, 8+2P, and 8+3P in addition to 3 and 4 way replication. Choosing an erasure code involves considering several factors. IBM Spectrum Scale Erasure Code Edition more details see section 18 in the Scale FAQ on the web https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Each IBM Spectrum Scale Erasure Code Edition recovery group can have 4 - 32 storage nodes, and there can be up to 128 storage nodes in an IBM Spectrum Scale cluster using IBM Spectrum Scale Erasure Code Edition. For more information, see Planning for erasure code selection in the IBM Spectrum Scale Erasure Code Edition Knowledge Center. https://www.ibm.com/support/knowledgecenter/en/STXKQY_ECE_5.0.3/ibmspectrumscaleece503_welcome.html Minimum requirements for IBM Spectrum Scale Erasure Code Edition see: https://www.ibm.com/support/knowledgecenter/STXKQY_ECE_5.0.3/com.ibm.spectrum.scale.ece.v5r03.doc/b1lece_min_hwrequirements.htm The hardware and network precheck tools can be downloaded from the following links: Hardware precheck: https://github.com/IBM/SpectrumScale_ECE_OS_READINESS Network precheck: https://github.com/IBM/SpectrumScale_NETWORK_READINESS The network can be either Ethernet or InfiniBand, and must be at least 25 Gbps bandwidth, with an average latency of 1.0 msec or less between any two storage nodes. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 14:43:10 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 09:43:10 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6=09?= =?utf-8?q?kernel=093=2E10=2E0-957=2E21=2E2?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=yWFAPveNSlMNNB5WT9HWp-2gQFFcYeCEsQdME5UvoGw&s=xZFqiCTjE-2e_6gM6MkzBcALK0hp-3ZquA7bt2GIjt8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 11 13:27:46 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 11 Jun 2019 12:27:46 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue Jun 11 16:54:03 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 11 Jun 2019 11:54:03 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with=09RHEL7=2E6?= =?utf-8?b?CWtlcm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=NrtWuqEKU3u4gccYHay_zERd91aEy7i2xuokUigK6fU&s=ctyTZhprfx7BRmt6V2wvvXV5p6iROrbSnRZf9WlfaXs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jun 11 18:55:36 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 11 Jun 2019 17:55:36 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release In-Reply-To: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Message-ID: Hi Mark, I case you didn't see, Lenovo released DSS-G 2.3a today. From the release notes: - IBM Spectrum Scale RAID * updated release 5.0 to 5.0.2-PTF3-efix0.1 (5.0.2-3.0.1) * updated release 4.2 to 4.2.3-PTF14 (4.2.3-14) Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of marc.caubet at psi.ch [marc.caubet at psi.ch] Sent: 03 June 2019 09:51 To: gpfsug main discussion list Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Tue Jun 11 20:32:41 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:32:41 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> Message-ID: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is > not enough? or even mmfsadm saferdump verbs | grep > VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never > mind, I understand your explanation and it is not definitely a big > issue... it was, above all, a curiosity to understand if the > command was modified to get the same behavior as before, but in a > different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( > and "officially" why isn't mmdiag good enough - what is missing. As > you probably know, mmfsadm might cause crashes and deadlock from > time to time, this is why we're trying to provide "safe ways" to > get the required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but > in a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden > test commands were changed. And since mmfsadm is not externalized > none of them is documented ( and the help messages are not > consistent as well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | > skipio | noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt > err | breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] > verbs status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on > Feb 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 > secs, pid 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on > Oct 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | > break | qps | inject op cnt err | breakqperr | qperridx idx | > breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- From bbanister at jumptrading.com Tue Jun 11 20:37:52 2019 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jun 2019 19:37:52 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: This has been brocket for a long time... we too were checking that `mmfsadm test verbs status` reported that RDMA is working. We don't want nodes that are not using RDMA running in the cluster. We have decided to just look for the log entry like this: test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" /var/adm/ras/mmfs.log.latest)" == "1" ]] } Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Tuesday, June 11, 2019 2:33 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] verbs status not working in 5.0.2 [EXTERNAL EMAIL] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is not > enough? or even mmfsadm saferdump verbs | grep VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never mind, I > understand your explanation and it is not definitely a big issue... it > was, above all, a curiosity to understand if the command was modified > to get the same behavior as before, but in a different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( and > "officially" why isn't mmdiag good enough - what is missing. As you > probably know, mmfsadm might cause crashes and deadlock from time to > time, this is why we're trying to provide "safe ways" to get the > required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but in > a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden test > commands were changed. And since mmfsadm is not externalized none of > them is documented ( and the help messages are not consistent as > well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt err | > breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] verbs > status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on Feb > 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 secs, pid > 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on Oct > 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | break | > qps | inject op cnt err | breakqperr | qperridx idx | breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Jun 11 20:45:40 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:45:40 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- From kums at us.ibm.com Tue Jun 11 20:49:12 2019 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 11 Jun 2019 15:49:12 -0400 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk><83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch><83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch><812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: Hi, This issue is resolved in the latest 5.0.3.1 release. # mmfsadm dump version | grep Build Build branch "5.0.3.1 ". # mmfsadm test verbs status VERBS RDMA status: started Regards, -Kums From: Ryan Novosielski To: "gpfsug-discuss at spectrumscale.org" Date: 06/11/2019 03:46 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Tue Jun 11 20:50:49 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:50:49 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thank you, that's great news. Now we just have to wait for that to make it to the DSS-G release. :-| On 6/11/19 3:49 PM, Kumaran Rajaram wrote: > Hi, > > This issue is resolved in the latest 5.0.3.1 release. > > /# mmfsadm dump version | grep Build/ */Build/*/branch "5.0.3.1 > "./ > > /# mmfsadm test verbs status/ /VERBS RDMA status: started/ > > Regards, -Kums > > > > Inactive hide details for Ryan Novosielski ---06/11/2019 03:46:54 > PM--------BEGIN PGP SIGNED MESSAGE----- Hash: SHA1Ryan Novosielski > ---06/11/2019 03:46:54 PM--------BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > From: Ryan Novosielski To: > "gpfsug-discuss at spectrumscale.org" > Date: 06/11/2019 03:46 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working > in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ---------------------------------------------------------------------- - -- > > > > > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Thanks -- this was originally how Lenovo told us to check this, and > I came across `mmfsadm test verbs status` on my own. > > I'm thinking, though, isn't there some risk that if RDMA went down > somehow, that wouldn't be caught by your script? I can't say that > I normally see that as the failure mode (it's most often booting > up without), nor do I know what happens to `mmfsadm test verbs > status` if you pull a cable or something. > > On 6/11/19 3:37 PM, Bryan Banister wrote: >> This has been brocket for a long time... we too were checking >> that `mmfsadm test verbs status` reported that RDMA is working. >> We don't want nodes that are not using RDMA running in the >> cluster. >> >> We have decided to just look for the log entry like this: >> test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" >> /var/adm/ras/mmfs.log.latest)" == "1" ]] } >> >> Hope that helps, -Bryan > > - -- ____ || \\UTGERS, > |----------------------*O*------------------------ ||_// the State > | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. > Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | > Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP > SIGNATURE----- > > iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx > vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= > =kBtE -----END PGP SIGNATURE----- > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAGFAAKCRCZv6Bp0Ryx vhGoAKDHtV4vNboVxdfrp7DLLBKp6+m60QCfQJRvJ+xEoXgpDO2VBbSBu0bMDwM= =aOrz -----END PGP SIGNATURE----- From p.childs at qmul.ac.uk Wed Jun 12 09:50:29 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 12 Jun 2019 08:50:29 +0000 Subject: [gpfsug-discuss] Odd behavior using sudo for mmchconfig Message-ID: Yesterday, I updated updated some gpfs config using sudo /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 which looked to have worked fine, however later other machines started reported issues with permissions while running mmlsquota as a user, cannot open file `/var/mmfs/gen/mmfs.cfg.ls' for reading (Permission denied) cannot open file `/var/mmfs/gen/mmfs.cfg' for reading (Permission denied) this was corrected by run-running the command from the same machine within a root session. sudo -s /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=20000,maxStatCache=80000 /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 exit I suspecting an environment issue from within sudo caused the gpfs config to have its permissions to change, but I've done simular before with no bad effects, so I'm a little confused. We're looking at tightening up our security to reduce the need for need for root based password less access from none admin nodes, but I've never understood the expect requirements this is using setting correctly, and I periodically see issues with our root known_hosts files when we update our admin hosts and hence I often endup going around with 'mmdsh -N all echo ""' to clear the old entries, but I always find this less than ideal, and hence would prefer a better solution. Thanks for any ideas to get this right and avoid future issues. I'm more than happy to open a IBM ticket on this issue, but I feel community feed back might get me further to start with. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From spectrumscale at kiranghag.com Thu Jun 13 17:55:07 2019 From: spectrumscale at kiranghag.com (KG) Date: Thu, 13 Jun 2019 22:25:07 +0530 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: > Renar, > > With the change below, which is a retrofit of a change deployed in newer > kernels, an inconsistency has taken place between the GPFS kernel > portability layer and the kernel proper. A known result of that > inconsistency is a kernel crash. One known sequence leading to the crash > involves the mkdir() call. > > We are working on an official notification on the issue. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/11/2019 > 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl]"Grunenberg, > Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a > generic Problem in rhel or only a scale related. Are there a > > From: "Grunenberg, Renar" > To: gpfsug main discussion list > Date: 06/11/2019 08:28 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hallo Felipe, > can you explain is this a generic Problem in rhel or only a scale related. > Are there any cicumstance already available? We ask redhat but have no > points that this are know to them? > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 15:43 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Renar, > > Thanks. Of the changes below, it appears that > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > was the one that ended up triggering the problem. Our investigations now > show that RHEL kernels >= 3.10.0-957.19.1 are impacted. > > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/10/2019 > 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" > ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: > > From: "Grunenberg, Renar" <*Renar.Grunenberg at huk-coburg.de* > > > To: "'gpfsug-discuss at spectrumscale.org'" < > *gpfsug-discuss at spectrumscale.org* > > Date: 06/10/2019 08:43 AM > Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > Hallo Felipe, > > here are the change list: > RHBA-2019:1337 kernel bug fix update > > > Summary: > > Updated kernel packages that fix various bugs are now available for Red > Hat Enterprise Linux 7. > > The kernel packages contain the Linux kernel, the core of any Linux > operating system. > > This update fixes the following bugs: > > * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) > > * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with > SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server > should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked > delegations (BZ#1689811) > > * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx > mtip_init_cmd_header routine (BZ#1689929) > > * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) > > * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal > cards (Regression from 1584963) - Need to flush fb writes when rewinding > push buffer (BZ#1690761) > > * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel > client issue (BZ#1692266) > > * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan > trunk and header rewrite (BZ#1693110) > > * aio O_DIRECT writes to non-page-aligned file locations on ext4 can > result in the overlapped portion of the page containing zeros (BZ#1693561) > > * [HP WS 7.6 bug] Audio driver does not recognize multi function audio > jack microphone input (BZ#1693562) > > * XFS returns ENOSPC when using extent size hint with space still > available (BZ#1693796) > > * OVN requires IPv6 to be enabled (BZ#1694981) > > * breaks DMA API for non-GPL drivers (BZ#1695511) > > * ovl_create can return positive retval and crash the host (BZ#1696292) > > * ceph: append mode is broken for sync/direct write (BZ#1696595) > > * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL > (BZ#1697241) > > * Failed to load kpatch module after install the rpm package occasionally > on ppc64le (BZ#1697867) > > * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) > > * Resizing an online EXT4 filesystem on a loopback device hangs > (BZ#1698110) > > * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) > > * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable > to discover newly added VMware LSI Logic SAS virtual disks without a > reboot. (BZ#1699723) > > * kernel: zcrypt: fix specification exception on z196 at ap probe > (BZ#1700706) > > * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() > (BZ#1701293) > > * stime showed huge values related to wrong calculation of time deltas > (L3:) (BZ#1701743) > > * Kernel panic due to NULL pointer dereference at > sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using > hard-coded device (BZ#1701991) > > * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings > (BZ#1702282) > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > * Missing wakeup leaves task stuck waiting in blk_queue_enter() > (BZ#1702921) > > * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) > > * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) > > * md_clear flag missing from /proc/cpuinfo on late microcode update > (BZ#1712993) > > * MDS mitigations are not enabled after double microcode update > (BZ#1712998) > > * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 > __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) > > Users of kernel are advised to upgrade to these updated packages, which > fix these bugs. > > Full details and references: > > *https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2* > > > Revision History: > > Issue Date: 2019-06-04 > Updated: 2019-06-04 > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: *Renar.Grunenberg at huk-coburg.de* > Internet: *www.huk.de* > > ------------------------------ > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > *Von:* *gpfsug-discuss-bounces at spectrumscale.org* > [ > *mailto:gpfsug-discuss-bounces at spectrumscale.org* > ] *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 06:41 > *An:* gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > *Betreff:* Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Hi, > > Though we are still learning what workload results in the problem, it > appears that even minimal I/O on the file system may cause the OS to crash. > One pattern that we saw was 'mkdir'. There is a chance that the DR site was > not yet impacted because no I/O workload has been run there. In that case, > rolling back to the prior kernel level (one which has been tested before) > may be advisable. > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my > customer already upgraded their DR site. Is rollback advised]KG > ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR > site. Is rollback advised? They will be running from DR > > From: KG <*spectrumscale at kiranghag.com* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 06/09/2019 09:38 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > One of my customer already upgraded their DR site. > > Is rollback advised? They will be running from DR site for a day in > another week. > > On Sat, Jun 8, 2019, 03:37 Felipe Knop <*knop at us.ibm.com* > > wrote: > > Zach, > > This appears to be affecting all Scale versions, including > 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. > (3.10.0-957 is not impacted) > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of > Spectrum Scale versions are you referring to? 5.0.2-3? > --------------------------- > > From: Zachary Mance <*zmance at ucar.edu* > > To: gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with > RHEL7.6 kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > Which versions of Spectrum Scale versions are you referring > to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > HPC Data Infrastructure Group / CISL / NCAR > --------------------------------------------------------------------------------------------------------------- > > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > All, > > There have been reported issues > (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel > 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until > further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jun 13 20:25:16 2019 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 13 Jun 2019 15:25:16 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6_ke?= =?utf-8?b?cm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Kiran, If SELinux is disabled (SELinux mode set to 'disabled') then the crash should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 or stay at that level. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/13/2019 12:56 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl"Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there a From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop < knop at us.ibm.com> wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ruNEnNWRM7KKCMlL1L1FqB8Ivd1BJ06q9bTmFf91ers&s=ccj51O58apypgvaYh1EVyKuP6GiWRZRSg-z00jTT0UI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Fri Jun 14 00:15:09 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 13 Jun 2019 19:15:09 -0400 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: <27309.1560467709@turing-police> On Thu, 13 Jun 2019 15:25:16 -0400, "Felipe Knop" said: > If SELinux is disabled (SELinux mode set to 'disabled') then the crash > should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 > or stay at that level. Note that if you have any plans to re-enable SELinux in the future, you'll have to do a relabel, which could take a while if you have large filesystems with tens or hundreds of millions of inodes.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From cblack at nygenome.org Mon Jun 17 17:24:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:24:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance Message-ID: Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). What can we do to gracefully handle network maintenance reducing bandwidth in half? Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) Any other ideas or comments? Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. Best, Chris ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Jun 17 17:31:38 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:31:38 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: Hi Chris, I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. Regards, Alex On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: > Our network team sometimes needs to take down sections of our network for > maintenance. Our systems have dual paths thru pairs of switches, but often > the maintenance will take down one of the two paths leaving all our nsd > servers with half bandwidth. > > Some of our systems are transmitting at a higher rate than can be handled > by half network (2x40Gb hosts with tx of 50Gb+). > > What can we do to gracefully handle network maintenance reducing bandwidth > in half? > > Should we set maxMBpS for affected nodes to a lower value? (default on our > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > Any other ideas or comments? > > Our hope is that metadata operations are not affected much and users just > see jobs and processes read or write at a slower rate. > > > > Best, > > Chris > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:37:48 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:37:48 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi I would really look into QoS instead. -- Cheers > On 17 Jun 2019, at 19.33, Alex Chekholko wrote: > > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. > > Regards, > Alex > > >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: >> Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. >> >> Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). >> >> What can we do to gracefully handle network maintenance reducing bandwidth in half? >> >> Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >> >> Any other ideas or comments? >> >> Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. >> >> >> >> Best, >> >> Chris >> >> This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jun 17 17:38:47 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 17 Jun 2019 16:38:47 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From cblack at nygenome.org Mon Jun 17 17:47:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:47:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. Best, Chris ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From alex at calicolabs.com Mon Jun 17 17:51:27 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:51:27 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Hi all, My experience with MaxMBpS was in the other direction but it did make a difference. We had lots of spare network bandwith (that is, the network was not the bottleneck) and in the course of various GPFS tuning it also looked like the disks were not too busy, and the NSDs were not too busy, so bumping up the MaxMBpS improved performance and allowed GPFS to do more. Of course, this was many years ago on different GPFS version and hardware, but I think it would work in the other direction. It should also be very safe to try. Regards, Alex On Mon, Jun 17, 2019 at 9:47 AM Christopher Black wrote: > The man page indicates that maxMBpS can be used to "artificially limit how > much I/O one node can put on all of the disk servers", but it might not be > the best choice. Man page also says maxmbps is in the class of mmchconfig > changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) > and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Skylar Thompson" behalf of skylar2 at uw.edu> wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS > should > use its in-memory buffers for read prefetches and dirty writes. > > On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > > Hi Chris, > > > > I think the next thing to double-check is when the maxMBpS change > takes > > effect. You may need to restart the nsds. Otherwise I think your > plan is > > sound. > > > > Regards, > > Alex > > > > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black < > cblack at nygenome.org> > > wrote: > > > > > Our network team sometimes needs to take down sections of our > network for > > > maintenance. Our systems have dual paths thru pairs of switches, > but often > > > the maintenance will take down one of the two paths leaving all > our nsd > > > servers with half bandwidth. > > > > > > Some of our systems are transmitting at a higher rate than can be > handled > > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > > > What can we do to gracefully handle network maintenance reducing > bandwidth > > > in half? > > > > > > Should we set maxMBpS for affected nodes to a lower value? > (default on our > > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 > for 32Gbps?) > > > > > > Any other ideas or comments? > > > > > > Our hope is that metadata operations are not affected much and > users just > > > see jobs and processes read or write at a slower rate. > > > > > > > > > > > > Best, > > > > > > Chris > > > ------------------------------ > > > This message is for the recipient???s use only, and may contain > > > confidential, privileged or protected information. Any > unauthorized use or > > > dissemination of this communication is prohibited. If you received > this > > > message in error, please immediately notify the sender and destroy > all > > > copies of this message. The recipient should check this email and > any > > > attachments for the presence of viruses, as we accept no liability > for any > > > damage caused by any virus transmitted by this email. > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:54:04 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:54:04 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi Writing from phone so excuse the typos. Assuming you have a system pool (metadata) and some other pool/s you can set limits on maintenance class that you done already and on other class that would affect all the other ops. You can add those per node or nodeclass that can be matched to what part/s of network you are working with. Changes are online and immediate. And you can measure normal load just by having QoS activated and looking into the values for few days. Hope makes some sense the above. -- Cheers > On 17 Jun 2019, at 19.48, Christopher Black wrote: > > The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should > use its in-memory buffers for read prefetches and dirty writes. > >> On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: >> Hi Chris, >> >> I think the next thing to double-check is when the maxMBpS change takes >> effect. You may need to restart the nsds. Otherwise I think your plan is >> sound. >> >> Regards, >> Alex >> >> >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black >> wrote: >> >>> Our network team sometimes needs to take down sections of our network for >>> maintenance. Our systems have dual paths thru pairs of switches, but often >>> the maintenance will take down one of the two paths leaving all our nsd >>> servers with half bandwidth. >>> >>> Some of our systems are transmitting at a higher rate than can be handled >>> by half network (2x40Gb hosts with tx of 50Gb+). >>> >>> What can we do to gracefully handle network maintenance reducing bandwidth >>> in half? >>> >>> Should we set maxMBpS for affected nodes to a lower value? (default on our >>> ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >>> >>> Any other ideas or comments? >>> >>> Our hope is that metadata operations are not affected much and users just >>> see jobs and processes read or write at a slower rate. >>> >>> >>> >>> Best, >>> >>> Chris >>> ------------------------------ >>> This message is for the recipient???s use only, and may contain >>> confidential, privileged or protected information. Any unauthorized use or >>> dissemination of this communication is prohibited. If you received this >>> message in error, please immediately notify the sender and destroy all >>> copies of this message. The recipient should check this email and any >>> attachments for the presence of viruses, as we accept no liability for any >>> damage caused by any virus transmitted by this email. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= >>> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=zyyij5eDMGGtTC00mplr-3aAR3dbStZGhwocBYKIyUg&s=dlSFGfd_CW47EaNE-5X9tMCkmqZ8WayaLCGI1sTzpkA&e= > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jun 17 20:39:46 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 17 Jun 2019 15:39:46 -0400 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Please note that the maxmbps parameter of mmchconfig is not part of the QOS features of the mmchqos command. mmchqos can be used to precisely limit IOPs. You can even set different limits for NSD traffic originating at different nodes. However, use the "force" of QOS carefully! No doubt you can bring a system to a virtual standstill if you set the IOPS values incorrectly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Jun 18 20:30:53 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 18 Jun 2019 15:30:53 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Message-ID: All, With respect to the issues (including kernel crashes) on Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just been released: https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 (as described in the link above) A fix is now available in efix form for both 4.2.3 and 5.0.x . The fix should be included in the upcoming PTFs for 4.2.3 and 5.0.3. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Wed Jun 19 00:23:37 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 18 Jun 2019 23:23:37 +0000 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: Hi We are doing a underlying hardware change that will result in the Linux device file names changing for attached storage. Hence I need to reconfigure the NSDs to use the new Linux device names. What is the best way to do this ? Thanks in advance Regards, Rob Logie IT Specialist A/NZ GBS Ballarat CIC Office: +61-3-5339 7748| Mobile: +61-411-021-029| Tie-Line: 97748 E-mail: roblogie at au1.ibm.com | Lotus Notes: Rob Logie/Australia/IBM IBM Buiilding, BA02 129 Gear Avenue, Mount Helen, Vic, 3350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 19 01:32:40 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 18 Jun 2019 20:32:40 -0400 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD In-Reply-To: References: Message-ID: <11132.1560904360@turing-police> On Tue, 18 Jun 2019 23:23:37 -0000, "Rob Logie" said: > We are doing a underlying hardware change that will result in the Linux > device file names changing for attached storage. > Hence I need to reconfigure the NSDs to use the new Linux device names. The only time GPFS cares about the Linux device names is when you go to actually create an NSD. After that, it just romps through /dev, finds anything that looks like a disk, and if it has an NSD on it at the appropriate offset, claims it as a GPFS device. (Protip: Since in a cluster the same disk may not have enumerated to the same name on all NSD servers that have visibility to it, you're almost always better off initially doing an mmcreatnsd specifying only one server, and then using mmchnsd to add the other servers to the server list for it) Heck, even without hardware changes, there's no guarantee that the disks enumerate in the same order across reboots (especially if you have a petabyte of LUNs and 8 or 16 paths to each LUN, though it's possible to tell the multipath daemon to have stable names for the multipath devices) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Jun 19 11:22:51 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 19 Jun 2019 11:22:51 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: Message-ID: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From arc at b4restore.com Wed Jun 19 12:30:33 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 19 Jun 2019 11:30:33 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From knop at us.ibm.com Wed Jun 19 13:22:40 2019 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Jun 2019 08:22:40 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Andi, Thank you. At least from the point of view of the change in the kernel (RHBA-2019:1337) that triggered the compatibility break between the GPFS kernel module and the kernel, the GPFS efix should work with the newer kernel. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Andi Rhod Christiansen To: gpfsug main discussion list Date: 06/19/2019 07:42 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From INDULISB at uk.ibm.com Wed Jun 19 13:36:26 2019 From: INDULISB at uk.ibm.com (Indulis Bernsteins1) Date: Wed, 19 Jun 2019 13:36:26 +0100 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: You can also speed up the startup of Spectrum Scale (GPFS) by using the nsddevices exit to supplement or bypass the normal "scan all block devices" process by Spectrum Scale. Useful if you have lots of LUNs or other block devices which are not NSDs, or for multipath. Though later versions of Scale might have fixed the scan for multipath devices. Anyway, this is old but potentially useful https://mytravelingfamily.com/2009/03/03/making-gpfs-work-using-multipath-on-linux/ All the information, representations, statements, opinions and proposals in this document are correct and accurate to the best of our present knowledge but are not intended (and should not be taken) to be contractually binding unless and until they become the subject of separate, specific agreement between us. Any IBM Machines provided are subject to the Statements of Limited Warranty accompanying the applicable Machine. Any IBM Program Products provided are subject to their applicable license terms. Nothing herein, in whole or in part, shall be deemed to constitute a warranty. IBM products are subject to withdrawal from marketing and or service upon notice, and changes to product configurations, or follow-on products, may result in price changes. Any references in this document to "partner" or "partnership" do not constitute or imply a partnership in the sense of the Partnership Act 1890. IBM is not responsible for printing errors in this proposal that result in pricing or information inaccuracies. Regards, Indulis Bernsteins Systems Architect IBM New Generation Storage Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Thu Jun 20 23:18:01 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 20 Jun 2019 22:18:01 +0000 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: <0D7782FD-5594-4D9D-8B2B-B0BF22A4CB5F@oarc.rutgers.edu> Hi there, Been reading the documentation and wikis and such this afternoon, but could use some assistance from someone who is more well-versed in AFM and policy writing to confirm that what I?m looking to do is actually feasible. Is is possible to: 1) Have a policy that, generally, continuously prefetches a single fileset of an AFM cache (make sure those files are there whenever possible)? 2) Generally prefer not evict files from that fileset, unless it?s necessary, opting to evict other stuff first? It seems to me that one can do a prefetch on the fileset, but that future files will not be prefetched, requiring you to run this periodically. Additionally, by default, it would seem as if these files would frequently be evicted in the case where it becomes necessary if they are infrequently used. Would like to avoid too much churn on this but provide fast access to these files (it?s a software tree, not user files). Thanks in advance! I?d rather know that it?s possible before digging too deeply into the how. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From quixote at us.ibm.com Fri Jun 21 13:06:35 2019 From: quixote at us.ibm.com (Chris Kempin) Date: Fri, 21 Jun 2019 08:06:35 -0400 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: Ryan: 1) You will need to just regularly run a prefetch to bring over the latest files .. you could either just run it regularly on the cache ( probably using the --directory flag to scan the whole fileset for uncached files ) or, with a little bit of scripting, you might be able to drive the prefetch from home if you know what files have been created/changed by shipping over to the cache a list of files to prefetch and have something prefetch that list when it arrives. 2) As to eviction, just set afmEnableAutoEviction=no and don't evict. is there a storage constraint on the cache that would force you to evict? I was using AFM in a more interactive application, with many small files and performance was not an issue in terms of "fast" access to files, but things to consider What is the network latency between home and cache? How big are the files you are dealing with? If you have very large files, you may want multiple gateways so they can fetch in parallel. How much change is there in the files? How many new/changed files a day are we talking about? Are existing files fairly stable? Regards, Chris Chris Kempin IBM Cloud - Site Reliability Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Tue Jun 25 12:38:28 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 11:38:28 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello, I wonder if anyone has seen this... I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I've checked the NSDs via mmlsnsd and mmlsdisk commands and they are all 'ready' and 'up'. The multipaths to these NSDs are all fine too. Is there a way of finding out what 'access' (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access - 'mmnsdrediscover' returns nothing and run really fast (contrary to the statement 'This may take a while' when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 13:10:53 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 12:10:53 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Tue Jun 25 13:01:11 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Tue, 25 Jun 2019 14:01:11 +0200 Subject: [gpfsug-discuss] Charts Decks - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The chart decks of the user meeting along ISC are now available: https://spectrumscale.org/presentations/ Thanks to all speaker and participants. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: gpfsug main discussion list Date: 05/06/2019 10:44 Subject: [EXTERNAL] [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 Inactive hide details for "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=oSzGEkM6PXf5XfF3fAOrsCpqjyrt-ukWcaq3_Ldy_P4&s=GiOkq0F1T3eVSb1IeWaD7gKImm1gEVwhGaa1eIHDhD8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From son.truong at bristol.ac.uk Tue Jun 25 16:02:20 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 15:02:20 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello Renar, Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. I have double checked the multipath and devices and I can see these NSDs are available locally. How do I get GPFS to recognise this and server them out via 'localhost'? mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. Any more ideas? Regards, Son --------------------------- Message: 2 Date: Tue, 25 Jun 2019 12:10:53 +0000 From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Content-Type: text/plain; charset="utf-8" Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 89, Issue 26 ********************************************** From janfrode at tanso.net Tue Jun 25 18:13:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 25 Jun 2019 19:13:12 +0200 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: I?ve had a situation recently where mmnsddiscover didn?t help, but mmshutdown/mmstartup on that node did fix it. This was with v5.0.2-3 on ppc64le. -jf tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic > NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs > are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic > NSDs back to being served locally, but the other two are not behaving. I > have double checked the availability of these devices and their multipaths > but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give > you a colum like ?IO performed on node?. On NSD-Server you should see > localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. > 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, informieren Sie bitte sofort den Absender und vernichten > Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD > failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the > rescan-scsi-bus.sh command especially with the -r switch. Even though there > are no devices removed the script seems to interrupt currently working NSDs > and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked > the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and > ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular > node has to an NSD? And is there a command to force it to switch to local > access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to > the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research > Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Tue Jun 25 18:21:17 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 25 Jun 2019 13:21:17 -0400 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: It may depend on which state the NSDs are in with respect to the node in question. If from that node you use 'mmfsadm dump nsd | egrep "moved|error|broken" ' and see anything, that might be it. One or two of those states can be fixed by mmnsddiscover, the other(s) require a kick of mmfsd to get the NSDs back. I never remember which is which. -Jordan On Tue, Jun 25, 2019, 13:13 Jan-Frode Myklebust wrote: > I?ve had a situation recently where mmnsddiscover didn?t help, but > mmshutdown/mmstartup on that node did fix it. > > This was with v5.0.2-3 on ppc64le. > > > -jf > > tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > >> >> Hello Renar, >> >> Thanks for that command, very useful and I can now see the problematic >> NSDs are all served remotely. >> >> I have double checked the multipath and devices and I can see these NSDs >> are available locally. >> >> How do I get GPFS to recognise this and server them out via 'localhost'? >> >> mmnsddiscover -d seemed to have brought two of the four problematic >> NSDs back to being served locally, but the other two are not behaving. I >> have double checked the availability of these devices and their multipaths >> but everything on that side seems fine. >> >> Any more ideas? >> >> Regards, >> Son >> >> >> --------------------------- >> >> Message: 2 >> Date: Tue, 25 Jun 2019 12:10:53 +0000 >> From: "Grunenberg, Renar" >> To: "gpfsug-discuss at spectrumscale.org" >> >> Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to >> NSD failed with EIO, switching to access the disk remotely." >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Hallo Son, >> >> you can check the access to the nsd with mmlsdisk -m. This give >> you a colum like ?IO performed on node?. On NSD-Server you should see >> localhost, on nsd-client you see the hostig nsd-server per device. >> >> Regards Renar >> >> >> Renar Grunenberg >> Abteilung Informatik - Betrieb >> >> HUK-COBURG >> Bahnhofsplatz >> 96444 Coburg >> Telefon: 09561 96-44110 >> Telefax: 09561 96-44104 >> E-Mail: Renar.Grunenberg at huk-coburg.de >> Internet: www.huk.de >> ________________________________ >> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter >> Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. >> 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg >> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. >> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans >> Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. >> ________________________________ >> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte >> Informationen. >> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich >> erhalten haben, informieren Sie bitte sofort den Absender und vernichten >> Sie diese Nachricht. >> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht >> ist nicht gestattet. >> >> This information may contain confidential and/or privileged information. >> If you are not the intended recipient (or have received this information >> in error) please notify the sender immediately and destroy this information. >> Any unauthorized copying, disclosure or distribution of the material in >> this information is strictly forbidden. >> ________________________________ >> Von: gpfsug-discuss-bounces at spectrumscale.org < >> gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong >> Gesendet: Dienstag, 25. Juni 2019 13:38 >> An: gpfsug-discuss at spectrumscale.org >> Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD >> failed with EIO, switching to access the disk remotely." >> >> Hello, >> >> I wonder if anyone has seen this? I am (not) having fun with the >> rescan-scsi-bus.sh command especially with the -r switch. Even though there >> are no devices removed the script seems to interrupt currently working NSDs >> and these messages appear in the mmfs.logs: >> >> 2019-06-25_06:30:48.706+0100: [I] Connected to >> 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [N] Connecting to >> 2019-06-25_06:30:51.195+0100: [I] Connected to >> 2019-06-25_06:30:59.857+0100: [N] Connecting to >> 2019-06-25_06:30:59.863+0100: [I] Connected to >> 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> >> These messages appear roughly at the same time each day and I?ve checked >> the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and >> ?up?. The multipaths to these NSDs are all fine too. >> >> Is there a way of finding out what ?access? (local or remote) a >> particular node has to an NSD? And is there a command to force it to switch >> to local access ? ?mmnsdrediscover? returns nothing and run really fast >> (contrary to the statement ?This may take a while? when it runs)? >> >> Any ideas appreciated! >> >> Regards, >> Son >> >> Son V Truong - Senior Storage Administrator Advanced Computing Research >> Centre IT Services, University of Bristol >> Email: son.truong at bristol.ac.uk >> Tel: Mobile: +44 (0) 7732 257 232 >> Address: 31 Great George Street, Bristol, BS1 5QD >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html >> > >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 89, Issue 26 >> ********************************************** >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 20:05:01 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 19:05:01 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: <832868CF-82CE-457E-91C7-2488B5C03D74@huk-coburg.de> Hallo Son, Please put mmnsddiscover -a N all. Are all NSD?s had there Server stanza Definition? Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= > Am 25.06.2019 um 17:02 schrieb Son Truong : > > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Wed Jun 26 09:58:09 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 26 Jun 2019 10:58:09 +0200 Subject: [gpfsug-discuss] Meet-up in Buenos Aires Message-ID: IBM will host an ?IBM Spectrum Scale Meet-up? along IBM Technical University Buenos Aires. This is the first user meeting in South America. All sessions will be in Spanish. https://www.spectrumscale.org/event/spectrum-scale-meet-up-in-buenos-aires/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jun 26 10:17:28 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 26 Jun 2019 09:17:28 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.roth at de.ibm.com Wed Jun 26 15:48:34 2019 From: stefan.roth at de.ibm.com (Stefan Roth) Date: Wed, 26 Jun 2019 16:48:34 +0200 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Message-ID: Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. |-----------------+----------------------+------------------------------------------------------+-----------+> |Mit freundlichen | | | || |Gr??en / Kind | | | || |regards | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Stefan Roth | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Spectrum Scale | | | || |GUI Development | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Phone: |+49-7034-643-1362 | IBM Deutschland | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |E-Mail: |stefan.roth at de.ibm.com| Am Weiher 24 | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | 65451 Kelsterbach | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | Germany | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |IBM Deutschland | | | || |Research & | | | || |Development | | | || |GmbH / | | | || |Vorsitzender des | | | || |Aufsichtsrats: | | | || |Matthias Hartmann| | | || | | | | || |Gesch?ftsf?hrung:| | | || |Dirk Wittkopp | | | || |Sitz der | | | || |Gesellschaft: | | | || |B?blingen / | | | || |Registergericht: | | | || |Amtsgericht | | | || |Stuttgart, HRB | | | || |243294 | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| |sf-gpfs.psi.ch|sf-ems1.psi.ch|gui_refresh_task_failed|NODE |sf-ems1.psi.ch|WARNING |The following GUI refresh task(s) | | | | | | | |failed: | | | | | | | |HEALTH_TRIGGERED;HW_INVENTORY | |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=1hMcHf9tfLS9nVUCQfQf4fELpZIFL9TdA8K3SBitL-w&s=SKPyQNlbW1HgGUGioHZhTr9gNlqdqpAV2SVJew0oLX0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Fri Jun 28 08:25:24 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:25:24 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From alvise.dorigo at psi.ch Fri Jun 28 08:32:42 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:32:42 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, , <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF9B@MBX214.d.ethz.ch> ops, and I made a double mistake: Currently I've 5.0.2-1 (not -3) on my GL2, and in house we only have x86_64, so I definitely need to download specific rpm from somewhere if it is compatible with 5.0.2-1. Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Dorigo Alvise (PSI) [alvise.dorigo at psi.ch] Sent: Friday, June 28, 2019 9:25 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From abeattie at au1.ibm.com Sat Jun 1 11:11:42 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 1 Jun 2019 10:11:42 +0000 Subject: [gpfsug-discuss] Gateway role on a NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Jun 3 09:51:53 2019 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 3 Jun 2019 08:51:53 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Message-ID: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Jun 5 09:42:15 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 5 Jun 2019 10:42:15 +0200 Subject: [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=uUqyuk8-P-Ra6X6T7ReoLj3kWy-VUg53oU2RZpf8bbg&s=XCJDxns17Ixdyviy_nuN0pCJsTkAN6dxCU994sl33qo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Fri Jun 7 22:45:31 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 17:45:31 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Message-ID: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zmance at ucar.edu Fri Jun 7 22:51:13 2019 From: zmance at ucar.edu (Zachary Mance) Date: Fri, 7 Jun 2019 15:51:13 -0600 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Jun 7 23:07:49 2019 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 7 Jun 2019 18:07:49 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ZcS98SBJVzdDsVcuu7KjSr64rfzEBaFDD86UkLkp8Vw&s=mjERh67H5DB6dfP0I1KES4-9Ku25AVoQxHoB5gArxR4&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Sat Jun 8 18:22:12 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sat, 8 Jun 2019 17:22:12 +0000 Subject: [gpfsug-discuss] Forcing an internal mount to complete Message-ID: I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? Not mounted on any other (local cluster) nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sun Jun 9 02:16:08 2019 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 8 Jun 2019 21:16:08 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: References: Message-ID: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger the internal mounts to release. Sent from my iPhone > On Jun 8, 2019, at 13:22, Oesterlin, Robert wrote: > > I have a few file systems that are showing ?internal mount? on my NSD servers, even though they are not mounted. I?d like to force them, without have to restart GPFS on those nodes - any options? > > Not mounted on any other (local cluster) nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 04:24:47 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sat, 8 Jun 2019 23:24:47 -0400 Subject: [gpfsug-discuss] Forcing an internal mount to complete In-Reply-To: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Hey Bob, Ditto on what Aaron said, it sounds as if the last fs manager might need a nudge. Things can get weird when a filesystem isn't mounted anywhere but a manager is needed for an operation though, so I would keep an eye on the ras logs of the cluster manager during the kick just to make sure the management duty isn't bouncing (which in turn can cause waiters). -Jordan On Sat, Jun 8, 2019 at 9:16 PM Aaron Knister wrote: > Bob, I wonder if something like an ?mmdf? or an ?mmchmgr? would trigger > the internal mounts to release. > > Sent from my iPhone > > On Jun 8, 2019, at 13:22, Oesterlin, Robert > wrote: > > I have a few file systems that are showing ?internal mount? on my NSD > servers, even though they are not mounted. I?d like to force them, without > have to restart GPFS on those nodes - any options? > > > > Not mounted on any other (local cluster) nodes. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jun 9 13:18:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 9 Jun 2019 12:18:39 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: Thanks for the suggestions - as it turns out, it was the *remote* mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? on the local cluster, to change the default mount point. Why would GPFS care if it?s remote mounted? Oh - well? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Sun Jun 9 14:20:28 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Sun, 9 Jun 2019 09:20:28 -0400 Subject: [gpfsug-discuss] [EXTERNAL] Re: Forcing an internal mount to complete In-Reply-To: References: <9892D4F1-2A0F-4D4E-BA63-F72A80442BEF@gmail.com> Message-ID: If there's any I/O going to the filesystem at all, GPFS has to keep it internally mounted on at least a few nodes such as the token managers and fs manager. I *believe* that holds true even for remote clusters, in that they still need to reach back to the "local" cluster when operating on the filesystem. I could be wrong about that though. On Sun, Jun 9, 2019, 09:06 Oesterlin, Robert wrote: > Thanks for the suggestions - as it turns out, it was the **remote** > mounts causing the issues - which surprises me. I wanted to do a ?mmchfs? > on the local cluster, to change the default mount point. Why would GPFS > care if it?s remote mounted? > > > > Oh - well? > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sun Jun 9 14:38:29 2019 From: spectrumscale at kiranghag.com (KG) Date: Sun, 9 Jun 2019 19:08:29 +0530 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: > Zach, > > This appears to be affecting all Scale versions, including 5.0.2 -- but > only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not > impacted) > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for Zachary Mance ---06/07/2019 05:51:37 > PM---Which versions of Spectrum Scale versions are you referring]Zachary > Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions > are you referring to? 5.0.2-3? --------------------------- > > From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > > HPC Data Infrastructure Group / CISL / NCAR > > --------------------------------------------------------------------------------------------------------------- > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > > All, > > There have been reported issues (including kernel crashes) on Spectrum > Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider > delaying upgrades to this kernel until further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Sun Jun 9 18:32:24 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 09 Jun 2019 18:32:24 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 05:29:14 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:29:14 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6=09kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Scott, Currently, we are only aware of the problem with 3.10.0-957.21.2 . We are not yet aware of the same problems also affecting 3.10.0-957.12.1, but hope to find out more shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Scott Goldman To: gpfsug main discussion list Date: 06/09/2019 01:50 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org And to be clear.. There is a .12 version: 3.10.0-957.12.1.el7.x86_64 Did you mean the .12 version or the .21? Conveniently, the kernel numbers are easily proposed! Sent from my BlackBerry - the most secure mobile device From: spectrumscale at kiranghag.com Sent: June 9, 2019 2:38 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=fQfU5Pw8BtsrqD8JCFskfMdm8ZIGWtDY-gMtk_iljwU&s=vVEdtvFYxwXzh3n52YWo4_XJIh4IvWzRl3NaAkmA-9E&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Mon Jun 10 05:41:29 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 00:41:29 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Spectrum_Scale_with_RHEL7=2E6_kernel?= =?utf-8?b?CTMuMTAuMC05NTcuMjEuMg==?= In-Reply-To: References: Message-ID: Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another?week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are?you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance ?zmance at ucar.edu??(303) 497-1883 HPC Data Infrastructure Group?/ CISL / NCAR ---------------------------------------------------------------------------------------------------------------? On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=7I4gXXVtdbnAsgAcK0NWr4-5d-a1bRr4578aC1wKRMo&s=jFJmGOvjWTjDfI_vI2pHOOvqzPw5rWbtLvrZdTEDtCg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scottg at emailhosting.com Mon Jun 10 06:02:19 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Mon, 10 Jun 2019 06:02:19 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: Message-ID: <3uok4eacuqj53g26epedg19j.1560142939257@emailhosting.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:24:52 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:24:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: Message-ID: Hallo Felippe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 10 13:43:02 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 10 Jun 2019 12:43:02 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From kraemerf at de.ibm.com Mon Jun 10 13:47:46 2019 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 10 Jun 2019 14:47:46 +0200 Subject: [gpfsug-discuss] *NEWS* - IBM Spectrum Scale Erasure Code Edition v5.0.3 Message-ID: FYI - What is IBM Spectrum Scale Erasure Code Edition, and why should I consider it? IBM Spectrum Scale Erasure Code Edition provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale on the customer?s own choice of commodity hardware with the added benefit of network-dispersed IBM Spectrum Scale RAID, and all of its features providing data protection, storage efficiency, and the ability to manage storage in hyperscale environments. SAS, NL-SAS, and NVMe drives are supported right now. IBM Spectrum Scale Erasure Code Edition supports 4 different erasure codes: 4+2P, 4+3P, 8+2P, and 8+3P in addition to 3 and 4 way replication. Choosing an erasure code involves considering several factors. IBM Spectrum Scale Erasure Code Edition more details see section 18 in the Scale FAQ on the web https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Each IBM Spectrum Scale Erasure Code Edition recovery group can have 4 - 32 storage nodes, and there can be up to 128 storage nodes in an IBM Spectrum Scale cluster using IBM Spectrum Scale Erasure Code Edition. For more information, see Planning for erasure code selection in the IBM Spectrum Scale Erasure Code Edition Knowledge Center. https://www.ibm.com/support/knowledgecenter/en/STXKQY_ECE_5.0.3/ibmspectrumscaleece503_welcome.html Minimum requirements for IBM Spectrum Scale Erasure Code Edition see: https://www.ibm.com/support/knowledgecenter/STXKQY_ECE_5.0.3/com.ibm.spectrum.scale.ece.v5r03.doc/b1lece_min_hwrequirements.htm The hardware and network precheck tools can be downloaded from the following links: Hardware precheck: https://github.com/IBM/SpectrumScale_ECE_OS_READINESS Network precheck: https://github.com/IBM/SpectrumScale_NETWORK_READINESS The network can be either Ethernet or InfiniBand, and must be at least 25 Gbps bandwidth, with an average latency of 1.0 msec or less between any two storage nodes. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 10 14:43:10 2019 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 10 Jun 2019 09:43:10 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6=09?= =?utf-8?q?kernel=093=2E10=2E0-957=2E21=2E2?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=yWFAPveNSlMNNB5WT9HWp-2gQFFcYeCEsQdME5UvoGw&s=xZFqiCTjE-2e_6gM6MkzBcALK0hp-3ZquA7bt2GIjt8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 11 13:27:46 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 11 Jun 2019 12:27:46 +0000 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> Message-ID: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised]KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG > To: gpfsug main discussion list > Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop > wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance > To: gpfsug main discussion list > Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop > wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue Jun 11 16:54:03 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 11 Jun 2019 11:54:03 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with=09RHEL7=2E6?= =?utf-8?b?CWtlcm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=NrtWuqEKU3u4gccYHay_zERd91aEy7i2xuokUigK6fU&s=ctyTZhprfx7BRmt6V2wvvXV5p6iROrbSnRZf9WlfaXs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jun 11 18:55:36 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 11 Jun 2019 17:55:36 +0000 Subject: [gpfsug-discuss] About new Lenovo DSS Software Release In-Reply-To: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180FE897CC@MBX214.d.ethz.ch> Message-ID: Hi Mark, I case you didn't see, Lenovo released DSS-G 2.3a today. From the release notes: - IBM Spectrum Scale RAID * updated release 5.0 to 5.0.2-PTF3-efix0.1 (5.0.2-3.0.1) * updated release 4.2 to 4.2.3-PTF14 (4.2.3-14) Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of marc.caubet at psi.ch [marc.caubet at psi.ch] Sent: 03 June 2019 09:51 To: gpfsug main discussion list Subject: [gpfsug-discuss] About new Lenovo DSS Software Release Dear all, this question mostly targets Lenovo Engineers and customers. Is there any update about the release date for the new software for Lenovo DSS G-Series? Also, I would like to know which version of GPFS will come with this software. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Tue Jun 11 20:32:41 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:32:41 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> Message-ID: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is > not enough? or even mmfsadm saferdump verbs | grep > VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never > mind, I understand your explanation and it is not definitely a big > issue... it was, above all, a curiosity to understand if the > command was modified to get the same behavior as before, but in a > different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( > and "officially" why isn't mmdiag good enough - what is missing. As > you probably know, mmfsadm might cause crashes and deadlock from > time to time, this is why we're trying to provide "safe ways" to > get the required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but > in a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer > Perry [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden > test commands were changed. And since mmfsadm is not externalized > none of them is documented ( and the help messages are not > consistent as well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | > skipio | noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt > err | breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] > verbs status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on > Feb 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 > secs, pid 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on > Oct 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | > break | qps | inject op cnt err | breakqperr | qperridx idx | > breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- From bbanister at jumptrading.com Tue Jun 11 20:37:52 2019 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 11 Jun 2019 19:37:52 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: This has been brocket for a long time... we too were checking that `mmfsadm test verbs status` reported that RDMA is working. We don't want nodes that are not using RDMA running in the cluster. We have decided to just look for the log entry like this: test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" /var/adm/ras/mmfs.log.latest)" == "1" ]] } Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Tuesday, June 11, 2019 2:33 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] verbs status not working in 5.0.2 [EXTERNAL EMAIL] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is not a change I like much either, though can obviously adapt to it. We have used "mmfsadm test verbs status" to confirm that RDMA is working by NHC (https://github.com/mej/nhc) on our compute nodes, and just for a quick check on the command line. Yes, there are the usual caveats, and yes the information is available another way, but a) it's the removal of a convenience that I'm quite sure that -- caveats aside - -- is not dangerous (it runs every 5 minutes on our system) b) it doesn't match the usage printed out on the command line and c) any other methods are quite a bit more information that then has to be parsed (perhaps also not as light a touch, but I don't know the code), and d) there doesn't seem to be a way now that works on both GPFS V4 and V5 (I confirmed that mmfsadm saferdump verbs | grep verbsRdmaStarted does not on V4). You'd also mentioned we really shouldn't be using mmfsadm regularly. Is there a way to get this information out of mmdiag if that is the supported command? Is there a way to do this that works for both V4 and V5? Philosophy of using mmfsadm aside though, we aren't supposed to rely on syntax for these commands remaining the same, but aren't we supposed to be able to rely on commands not falsely reporting syntax in their own usage message? I'd think at the very least, that's a bug in the "usage" text. On 12/19/18 5:35 AM, Tomer Perry wrote: > Hi, > > So, with all the usual disclaimers... mmfsadm saferdump verbs is not > enough? or even mmfsadm saferdump verbs | grep VerbsRdmaStarted > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 12:22 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > I'd like just one line that says "RDMA ON" or "RMDA OFF" (as was > reported more or less by mmfsadm). > > I can get info about RMDA using mmdiag, but is much more output to > parse (e.g. by a nagios script or just a human eye). Ok, never mind, I > understand your explanation and it is not definitely a big issue... it > was, above all, a curiosity to understand if the command was modified > to get the same behavior as before, but in a different way. > > Cheers, > > Alvise > > ---------------------------------------------------------------------- - -- > > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 11:05 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Changed means it provides some functions/information in a different > way. So, I guess the question is what information do you need? ( and > "officially" why isn't mmdiag good enough - what is missing. As you > probably know, mmfsadm might cause crashes and deadlock from time to > time, this is why we're trying to provide "safe ways" to get the > required information). > > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: "Dorigo Alvise (PSI)" To: > gpfsug main discussion list > Date: 19/12/2018 11:53 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hi Tomer, "changed" makes me suppose that it is still possible, but in > a different way... am I correct ? if yes, what it is ? > > thanks, > > Alvise > > ---------------------------------------------------------------------- - -- > > * > From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Tomer Perry > [TOMP at il.ibm.com]* Sent:* Wednesday, December 19, 2018 10:47 > AM* To:* gpfsug main discussion list* Subject:* Re: > [gpfsug-discuss] verbs status not working in 5.0.2 > > Hi, > > Yes, as part of the RDMA enhancements in 5.0.X much of the hidden test > commands were changed. And since mmfsadm is not externalized none of > them is documented ( and the help messages are not consistent as > well). > > Regards, > > Tomer Perry Scalable I/O Development (Spectrum Scale) email: > tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global > Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: > +972 52 2554625 > > > > > From: Simon Thompson To: > gpfsug main discussion list > Date: 19/12/2018 11:29 Subject: Re: [gpfsug-discuss] > verbs status not working in 5.0.2 Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- - -- > > > > > Hmm interesting ? > > # mmfsadm test verbs usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut } > > # mmfsadm test verbs status usage: {udapl | verbs} { status | skipio | > noskipio | dump | maxRpcsOut | maxReplysOut | maxRdmasOut > | config | conn | conndetails | stats | resetstats | ibcntreset | > ibcntr | ia | pz | psp | evd | lmr | break | qps | inject op cnt err | > breakqperr | qperridx idx | breakidx idx} > > mmfsadm test verbs config still works though (which includes > RdmaStarted flag) > > Simon* > > From: * on behalf of > "alvise.dorigo at psi.ch" * Reply-To: > *"gpfsug-discuss at spectrumscale.org" > * Date: *Wednesday, 19 December > 2018 at 08:51* To: *"gpfsug-discuss at spectrumscale.org" > * Subject: *[gpfsug-discuss] verbs > status not working in 5.0.2 > > Hi, in GPFS 5.0.2 I cannot run anymore "mmfsadm test verbs > status": > > [root at sf-dss-1 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "4.2.3.7 ". Built on Feb > 15 2018 at 11:38:38 Running 62 days 11 hours 24 minutes 35 secs, pid > 7510 VERBS RDMA status: started > > [root at sf-export-2 ~]# mmdiag --version ; mmfsadm test verbs status > > === mmdiag: version === Current GPFS build: "5.0.2.1 ". Built on Oct > 24 2018 at 21:23:46 Running 10 minutes 24 secs, pid 3570 usage: > {udapl | verbs} { status | skipio | noskipio | dump | maxRpcsOut | > maxReplysOut | maxRdmasOut | config | conn | conndetails | stats | > resetstats | ibcntreset | ibcntr | ia | pz | psp | evd | lmr | break | > qps | inject op cnt err | breakqperr | qperridx idx | breakidx idx} > > > Is it a known problem or am I doing something wrong ? > > Thanks, > > Alvise_______________________________________________ > gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss mailing > list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAB1AAKCRCZv6Bp0Ryx vhPDAKCZFKcsFcbNk8MBZvfr6Oz8C3+C5wCgvwXwHwX0S6SKI7NoRTszLPR2n/E= =Qxja -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Jun 11 20:45:40 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:45:40 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- From kums at us.ibm.com Tue Jun 11 20:49:12 2019 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 11 Jun 2019 15:49:12 -0400 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk><83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch><83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch><812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: Hi, This issue is resolved in the latest 5.0.3.1 release. # mmfsadm dump version | grep Build Build branch "5.0.3.1 ". # mmfsadm test verbs status VERBS RDMA status: started Regards, -Kums From: Ryan Novosielski To: "gpfsug-discuss at spectrumscale.org" Date: 06/11/2019 03:46 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks -- this was originally how Lenovo told us to check this, and I came across `mmfsadm test verbs status` on my own. I'm thinking, though, isn't there some risk that if RDMA went down somehow, that wouldn't be caught by your script? I can't say that I normally see that as the failure mode (it's most often booting up without), nor do I know what happens to `mmfsadm test verbs status` if you pull a cable or something. On 6/11/19 3:37 PM, Bryan Banister wrote: > This has been brocket for a long time... we too were checking that > `mmfsadm test verbs status` reported that RDMA is working. We > don't want nodes that are not using RDMA running in the cluster. > > We have decided to just look for the log entry like this: > test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" > /var/adm/ras/mmfs.log.latest)" == "1" ]] } > > Hope that helps, -Bryan - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= =kBtE -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Tue Jun 11 20:50:49 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 11 Jun 2019 19:50:49 +0000 Subject: [gpfsug-discuss] verbs status not working in 5.0.2 In-Reply-To: References: <7E7FA160-345E-4580-8DFC-6E13AAACE9BD@bham.ac.uk> <83A6EEB0EC738F459A39439733AE8045267C2330@MBX114.d.ethz.ch> <83A6EEB0EC738F459A39439733AE8045267C2354@MBX114.d.ethz.ch> <812009aa-1017-99fa-a82f-852135681b1d@rutgers.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thank you, that's great news. Now we just have to wait for that to make it to the DSS-G release. :-| On 6/11/19 3:49 PM, Kumaran Rajaram wrote: > Hi, > > This issue is resolved in the latest 5.0.3.1 release. > > /# mmfsadm dump version | grep Build/ */Build/*/branch "5.0.3.1 > "./ > > /# mmfsadm test verbs status/ /VERBS RDMA status: started/ > > Regards, -Kums > > > > Inactive hide details for Ryan Novosielski ---06/11/2019 03:46:54 > PM--------BEGIN PGP SIGNED MESSAGE----- Hash: SHA1Ryan Novosielski > ---06/11/2019 03:46:54 PM--------BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > From: Ryan Novosielski To: > "gpfsug-discuss at spectrumscale.org" > Date: 06/11/2019 03:46 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] verbs status not working > in 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ---------------------------------------------------------------------- - -- > > > > > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Thanks -- this was originally how Lenovo told us to check this, and > I came across `mmfsadm test verbs status` on my own. > > I'm thinking, though, isn't there some risk that if RDMA went down > somehow, that wouldn't be caught by your script? I can't say that > I normally see that as the failure mode (it's most often booting > up without), nor do I know what happens to `mmfsadm test verbs > status` if you pull a cable or something. > > On 6/11/19 3:37 PM, Bryan Banister wrote: >> This has been brocket for a long time... we too were checking >> that `mmfsadm test verbs status` reported that RDMA is working. >> We don't want nodes that are not using RDMA running in the >> cluster. >> >> We have decided to just look for the log entry like this: >> test_gpfs_rdma_active() { [[ "$(grep -c "VERBS RDMA started" >> /var/adm/ras/mmfs.log.latest)" == "1" ]] } >> >> Hope that helps, -Bryan > > - -- ____ || \\UTGERS, > |----------------------*O*------------------------ ||_// the State > | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. > Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | > Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP > SIGNATURE----- > > iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAE3gAKCRCZv6Bp0Ryx > vpvpAJ9KnVX79aXNu3oclxM6swYfZ5wKjQCeJF3s94tS7+2JtTlkc5OXV/E8LnI= > =kBtE -----END PGP SIGNATURE----- > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXQAGFAAKCRCZv6Bp0Ryx vhGoAKDHtV4vNboVxdfrp7DLLBKp6+m60QCfQJRvJ+xEoXgpDO2VBbSBu0bMDwM= =aOrz -----END PGP SIGNATURE----- From p.childs at qmul.ac.uk Wed Jun 12 09:50:29 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 12 Jun 2019 08:50:29 +0000 Subject: [gpfsug-discuss] Odd behavior using sudo for mmchconfig Message-ID: Yesterday, I updated updated some gpfs config using sudo /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 which looked to have worked fine, however later other machines started reported issues with permissions while running mmlsquota as a user, cannot open file `/var/mmfs/gen/mmfs.cfg.ls' for reading (Permission denied) cannot open file `/var/mmfs/gen/mmfs.cfg' for reading (Permission denied) this was corrected by run-running the command from the same machine within a root session. sudo -s /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=20000,maxStatCache=80000 /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=200000,maxStatCache=800000 exit I suspecting an environment issue from within sudo caused the gpfs config to have its permissions to change, but I've done simular before with no bad effects, so I'm a little confused. We're looking at tightening up our security to reduce the need for need for root based password less access from none admin nodes, but I've never understood the expect requirements this is using setting correctly, and I periodically see issues with our root known_hosts files when we update our admin hosts and hence I often endup going around with 'mmdsh -N all echo ""' to clear the old entries, but I always find this less than ideal, and hence would prefer a better solution. Thanks for any ideas to get this right and avoid future issues. I'm more than happy to open a IBM ticket on this issue, but I feel community feed back might get me further to start with. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From spectrumscale at kiranghag.com Thu Jun 13 17:55:07 2019 From: spectrumscale at kiranghag.com (KG) Date: Thu, 13 Jun 2019 22:25:07 +0530 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de> <3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: > Renar, > > With the change below, which is a retrofit of a change deployed in newer > kernels, an inconsistency has taken place between the GPFS kernel > portability layer and the kernel proper. A known result of that > inconsistency is a kernel crash. One known sequence leading to the crash > involves the mkdir() call. > > We are working on an official notification on the issue. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/11/2019 > 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl]"Grunenberg, > Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a > generic Problem in rhel or only a scale related. Are there a > > From: "Grunenberg, Renar" > To: gpfsug main discussion list > Date: 06/11/2019 08:28 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hallo Felipe, > can you explain is this a generic Problem in rhel or only a scale related. > Are there any cicumstance already available? We ask redhat but have no > points that this are know to them? > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > ------------------------------ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > *Von:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 15:43 > *An:* gpfsug main discussion list > *Betreff:* Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Renar, > > Thanks. Of the changes below, it appears that > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > was the one that ended up triggering the problem. Our investigations now > show that RHEL kernels >= 3.10.0-957.19.1 are impacted. > > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for "Grunenberg, Renar" ---06/10/2019 > 08:43:27 AM---Hallo Felipe, here are the change list:]"Grunenberg, Renar" > ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: > > From: "Grunenberg, Renar" <*Renar.Grunenberg at huk-coburg.de* > > > To: "'gpfsug-discuss at spectrumscale.org'" < > *gpfsug-discuss at spectrumscale.org* > > Date: 06/10/2019 08:43 AM > Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > Hallo Felipe, > > here are the change list: > RHBA-2019:1337 kernel bug fix update > > > Summary: > > Updated kernel packages that fix various bugs are now available for Red > Hat Enterprise Linux 7. > > The kernel packages contain the Linux kernel, the core of any Linux > operating system. > > This update fixes the following bugs: > > * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) > > * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with > SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server > should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked > delegations (BZ#1689811) > > * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx > mtip_init_cmd_header routine (BZ#1689929) > > * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) > > * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal > cards (Regression from 1584963) - Need to flush fb writes when rewinding > push buffer (BZ#1690761) > > * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel > client issue (BZ#1692266) > > * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan > trunk and header rewrite (BZ#1693110) > > * aio O_DIRECT writes to non-page-aligned file locations on ext4 can > result in the overlapped portion of the page containing zeros (BZ#1693561) > > * [HP WS 7.6 bug] Audio driver does not recognize multi function audio > jack microphone input (BZ#1693562) > > * XFS returns ENOSPC when using extent size hint with space still > available (BZ#1693796) > > * OVN requires IPv6 to be enabled (BZ#1694981) > > * breaks DMA API for non-GPL drivers (BZ#1695511) > > * ovl_create can return positive retval and crash the host (BZ#1696292) > > * ceph: append mode is broken for sync/direct write (BZ#1696595) > > * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL > (BZ#1697241) > > * Failed to load kpatch module after install the rpm package occasionally > on ppc64le (BZ#1697867) > > * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) > > * Resizing an online EXT4 filesystem on a loopback device hangs > (BZ#1698110) > > * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) > > * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable > to discover newly added VMware LSI Logic SAS virtual disks without a > reboot. (BZ#1699723) > > * kernel: zcrypt: fix specification exception on z196 at ap probe > (BZ#1700706) > > * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() > (BZ#1701293) > > * stime showed huge values related to wrong calculation of time deltas > (L3:) (BZ#1701743) > > * Kernel panic due to NULL pointer dereference at > sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using > hard-coded device (BZ#1701991) > > * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings > (BZ#1702282) > > * security: double-free attempted in security_inode_init_security() > (BZ#1702286) > > * Missing wakeup leaves task stuck waiting in blk_queue_enter() > (BZ#1702921) > > * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) > > * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) > > * md_clear flag missing from /proc/cpuinfo on late microcode update > (BZ#1712993) > > * MDS mitigations are not enabled after double microcode update > (BZ#1712998) > > * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 > __static_key_slow_dec+0xa6/0xb0 (BZ#1713004) > > Users of kernel are advised to upgrade to these updated packages, which > fix these bugs. > > Full details and references: > > *https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2* > > > Revision History: > > Issue Date: 2019-06-04 > Updated: 2019-06-04 > > Regards Renar > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: *Renar.Grunenberg at huk-coburg.de* > Internet: *www.huk.de* > > ------------------------------ > > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ------------------------------ > > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese > Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------------------------------ > > > *Von:* *gpfsug-discuss-bounces at spectrumscale.org* > [ > *mailto:gpfsug-discuss-bounces at spectrumscale.org* > ] *Im Auftrag von *Felipe Knop > *Gesendet:* Montag, 10. Juni 2019 06:41 > *An:* gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > *Betreff:* Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel > 3.10.0-957.21.2 > > Hi, > > Though we are still learning what workload results in the problem, it > appears that even minimal I/O on the file system may cause the OS to crash. > One pattern that we saw was 'mkdir'. There is a chance that the DR site was > not yet impacted because no I/O workload has been run there. In that case, > rolling back to the prior kernel level (one which has been tested before) > may be advisable. > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my > customer already upgraded their DR site. Is rollback advised]KG > ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR > site. Is rollback advised? They will be running from DR > > From: KG <*spectrumscale at kiranghag.com* > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 06/09/2019 09:38 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 > kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > One of my customer already upgraded their DR site. > > Is rollback advised? They will be running from DR site for a day in > another week. > > On Sat, Jun 8, 2019, 03:37 Felipe Knop <*knop at us.ibm.com* > > wrote: > > Zach, > > This appears to be affecting all Scale versions, including > 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. > (3.10.0-957 is not impacted) > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of > Spectrum Scale versions are you referring to? 5.0.2-3? > --------------------------- > > From: Zachary Mance <*zmance at ucar.edu* > > To: gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > > Date: 06/07/2019 05:51 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with > RHEL7.6 kernel 3.10.0-957.21.2 > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > > > Which versions of Spectrum Scale versions are you referring > to? 5.0.2-3? > > --------------------------------------------------------------------------------------------------------------- > Zach Mance *zmance at ucar.edu* (303) 497-1883 > HPC Data Infrastructure Group / CISL / NCAR > --------------------------------------------------------------------------------------------------------------- > > > > On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop <*knop at us.ibm.com* > > wrote: > All, > > There have been reported issues > (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel > 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until > further information is provided. > > Thanks, > > Felipe > > ---- > Felipe Knop *knop at us.ibm.com* > > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From knop at us.ibm.com Thu Jun 13 20:25:16 2019 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 13 Jun 2019 15:25:16 -0400 Subject: [gpfsug-discuss] =?utf-8?q?WG=3A_Spectrum_Scale_with_RHEL7=2E6_ke?= =?utf-8?b?cm5lbAkzLjEwLjAtOTU3LjIxLjI=?= In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: Kiran, If SELinux is disabled (SELinux mode set to 'disabled') then the crash should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 or stay at that level. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: KG To: gpfsug main discussion list Date: 06/13/2019 12:56 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi As per the flash - https://www-01.ibm.com/support/docview.wss?uid=ibm10887213&myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E this bug doesnt appear if SELinux is disabled. If customer is willing to disable SELinux, will it be ok to upgrade (or stay on upgraded level and avoid downgrade)? On Tue, Jun 11, 2019 at 9:24 PM Felipe Knop wrote: Renar, With the change below, which is a retrofit of a change deployed in newer kernels, an inconsistency has taken place between the GPFS kernel portability layer and the kernel proper. A known result of that inconsistency is a kernel crash. One known sequence leading to the crash involves the mkdir() call. We are working on an official notification on the issue. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Probl"Grunenberg, Renar" ---06/11/2019 08:28:07 AM---Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there a From: "Grunenberg, Renar" To: gpfsug main discussion list Date: 06/11/2019 08:28 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, can you explain is this a generic Problem in rhel or only a scale related. Are there any cicumstance already available? We ask redhat but have no points that this are know to them? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 15:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Renar, Thanks. Of the changes below, it appears that * security: double-free attempted in security_inode_init_security() (BZ#1702286) was the one that ended up triggering the problem. Our investigations now show that RHEL kernels >= 3.10.0-957.19.1 are impacted. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for "Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list:"Grunenberg, Renar" ---06/10/2019 08:43:27 AM---Hallo Felipe, here are the change list: From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 06/10/2019 08:43 AM Subject: [EXTERNAL] [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Felipe, here are the change list: RHBA-2019:1337 kernel bug fix update Summary: Updated kernel packages that fix various bugs are now available for Red Hat Enterprise Linux 7. The kernel packages contain the Linux kernel, the core of any Linux operating system. This update fixes the following bugs: * Mellanox CX-5 MAC learning with OVS H/W offload not working (BZ#1686292) * RHEL7.4 NFS4.1 client and server repeated SEQUENCE / TEST_STATEIDs with SEQUENCE Reply has SEQ4_STATUS_RECALLABLE_STATE_REVOKED set - NFS server should return NFS4ERR_DELEG_REVOKED or NFS4ERR_BAD_STATEID for revoked delegations (BZ#1689811) * PANIC: "BUG: unable to handle kernel paging request" in the mtip32xx mtip_init_cmd_header routine (BZ#1689929) * The nvme cli delete-ns command hangs indefinitely. (BZ#1690519) * drm/nouveau: nv50 - Graphics become sluggish or frozen for nvidia Pascal cards (Regression from 1584963) - Need to flush fb writes when rewinding push buffer (BZ#1690761) * [CEE/SD] Ceph+NFS server crashed and rebooted due to CephFS kernel client issue (BZ#1692266) * [Mellanox OVS offload] tc fails to calculate the checksum in case vlan trunk and header rewrite (BZ#1693110) * aio O_DIRECT writes to non-page-aligned file locations on ext4 can result in the overlapped portion of the page containing zeros (BZ#1693561) * [HP WS 7.6 bug] Audio driver does not recognize multi function audio jack microphone input (BZ#1693562) * XFS returns ENOSPC when using extent size hint with space still available (BZ#1693796) * OVN requires IPv6 to be enabled (BZ#1694981) * breaks DMA API for non-GPL drivers (BZ#1695511) * ovl_create can return positive retval and crash the host (BZ#1696292) * ceph: append mode is broken for sync/direct write (BZ#1696595) * Problem building module due to -EXPORT_SYMBOL_GPL/-EXPORT_SYMBOL (BZ#1697241) * Failed to load kpatch module after install the rpm package occasionally on ppc64le (BZ#1697867) * [Hyper-V][RHEL7] Stop suppressing PCID bit (BZ#1697940) * Resizing an online EXT4 filesystem on a loopback device hangs (BZ#1698110) * dm table: propagate BDI_CAP_STABLE_WRITES (BZ#1699722) * [ESXi][RHEL7.6]After upgrade to kernel-3.10.0-957.el7, system is unable to discover newly added VMware LSI Logic SAS virtual disks without a reboot. (BZ#1699723) * kernel: zcrypt: fix specification exception on z196 at ap probe (BZ#1700706) * XFS: Metadata corruption detected at xfs_attr3_leaf_write_verify() (BZ#1701293) * stime showed huge values related to wrong calculation of time deltas (L3:) (BZ#1701743) * Kernel panic due to NULL pointer dereference at sysfs_do_create_link_sd.isra.2+0x34 while loading [ipmi_si] module using hard-coded device (BZ#1701991) * IPv6 ECMP modulo N hashing inefficient when X^2 rt6i_nsiblings (BZ#1702282) * security: double-free attempted in security_inode_init_security() (BZ#1702286) * Missing wakeup leaves task stuck waiting in blk_queue_enter() (BZ#1702921) * Satellite Capsule sync triggers several XFS corruptions (BZ#1702922) * BUG: SELinux doesn't handle NFS crossmnt well (BZ#1702923) * md_clear flag missing from /proc/cpuinfo on late microcode update (BZ#1712993) * MDS mitigations are not enabled after double microcode update (BZ#1712998) * WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:90 __static_key_slow_dec +0xa6/0xb0 (BZ#1713004) Users of kernel are advised to upgrade to these updated packages, which fix these bugs. Full details and references: https://access.redhat.com/errata/RHBA-2019:1337?sc_cid=701600000006NHXAA2 Revision History: Issue Date: 2019-06-04 Updated: 2019-06-04 Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Felipe Knop Gesendet: Montag, 10. Juni 2019 06:41 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Hi, Though we are still learning what workload results in the problem, it appears that even minimal I/O on the file system may cause the OS to crash. One pattern that we saw was 'mkdir'. There is a chance that the DR site was not yet impacted because no I/O workload has been run there. In that case, rolling back to the prior kernel level (one which has been tested before) may be advisable. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Inactive hide details for KG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advisedKG ---06/09/2019 09:38:55 AM---One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR From: KG To: gpfsug main discussion list Date: 06/09/2019 09:38 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org One of my customer already upgraded their DR site. Is rollback advised? They will be running from DR site for a day in another week. On Sat, Jun 8, 2019, 03:37 Felipe Knop wrote: Zach, This appears to be affecting all Scale versions, including 5.0.2 -- but only when moving to the new 3.10.0-957.21.2 kernel. (3.10.0-957 is not impacted) Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 Zachary Mance ---06/07/2019 05:51:37 PM---Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------- From: Zachary Mance To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 06/07/2019 05:51 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Which versions of Spectrum Scale versions are you referring to? 5.0.2-3? --------------------------------------------------------------------------------------------------------------- Zach Mance zmance at ucar.edu (303) 497-1883 HPC Data Infrastructure Group / CISL / NCAR --------------------------------------------------------------------------------------------------------------- On Fri, Jun 7, 2019 at 3:45 PM Felipe Knop < knop at us.ibm.com> wrote: All, There have been reported issues (including kernel crashes) on Spectrum Scale with the latest RHEL7.6 kernel 3.10.0-957.21.2. Please consider delaying upgrades to this kernel until further information is provided. Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "graycol.gif" deleted by Felipe Knop/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=ruNEnNWRM7KKCMlL1L1FqB8Ivd1BJ06q9bTmFf91ers&s=ccj51O58apypgvaYh1EVyKuP6GiWRZRSg-z00jTT0UI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Fri Jun 14 00:15:09 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 13 Jun 2019 19:15:09 -0400 Subject: [gpfsug-discuss] WG: Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2 In-Reply-To: References: <1848549064bc481fb5cb4dcc24c51376@SMXRF105.msg.hukrf.de><3f86d441820d4c5a84a8bbf6fad5d000@SMXRF105.msg.hukrf.de> Message-ID: <27309.1560467709@turing-police> On Thu, 13 Jun 2019 15:25:16 -0400, "Felipe Knop" said: > If SELinux is disabled (SELinux mode set to 'disabled') then the crash > should not happen, and it should be OK to upgrade to (say) 3.10.0-957.21.2 > or stay at that level. Note that if you have any plans to re-enable SELinux in the future, you'll have to do a relabel, which could take a while if you have large filesystems with tens or hundreds of millions of inodes.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From cblack at nygenome.org Mon Jun 17 17:24:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:24:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance Message-ID: Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). What can we do to gracefully handle network maintenance reducing bandwidth in half? Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) Any other ideas or comments? Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. Best, Chris ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Jun 17 17:31:38 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:31:38 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: Hi Chris, I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. Regards, Alex On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: > Our network team sometimes needs to take down sections of our network for > maintenance. Our systems have dual paths thru pairs of switches, but often > the maintenance will take down one of the two paths leaving all our nsd > servers with half bandwidth. > > Some of our systems are transmitting at a higher rate than can be handled > by half network (2x40Gb hosts with tx of 50Gb+). > > What can we do to gracefully handle network maintenance reducing bandwidth > in half? > > Should we set maxMBpS for affected nodes to a lower value? (default on our > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > Any other ideas or comments? > > Our hope is that metadata operations are not affected much and users just > see jobs and processes read or write at a slower rate. > > > > Best, > > Chris > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:37:48 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:37:48 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi I would really look into QoS instead. -- Cheers > On 17 Jun 2019, at 19.33, Alex Chekholko wrote: > > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes effect. You may need to restart the nsds. Otherwise I think your plan is sound. > > Regards, > Alex > > >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black wrote: >> Our network team sometimes needs to take down sections of our network for maintenance. Our systems have dual paths thru pairs of switches, but often the maintenance will take down one of the two paths leaving all our nsd servers with half bandwidth. >> >> Some of our systems are transmitting at a higher rate than can be handled by half network (2x40Gb hosts with tx of 50Gb+). >> >> What can we do to gracefully handle network maintenance reducing bandwidth in half? >> >> Should we set maxMBpS for affected nodes to a lower value? (default on our ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >> >> Any other ideas or comments? >> >> Our hope is that metadata operations are not affected much and users just see jobs and processes read or write at a slower rate. >> >> >> >> Best, >> >> Chris >> >> This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Jun 17 17:38:47 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 17 Jun 2019 16:38:47 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: Message-ID: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From cblack at nygenome.org Mon Jun 17 17:47:54 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 17 Jun 2019 16:47:54 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. Best, Chris ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should use its in-memory buffers for read prefetches and dirty writes. On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > Hi Chris, > > I think the next thing to double-check is when the maxMBpS change takes > effect. You may need to restart the nsds. Otherwise I think your plan is > sound. > > Regards, > Alex > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black > wrote: > > > Our network team sometimes needs to take down sections of our network for > > maintenance. Our systems have dual paths thru pairs of switches, but often > > the maintenance will take down one of the two paths leaving all our nsd > > servers with half bandwidth. > > > > Some of our systems are transmitting at a higher rate than can be handled > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > What can we do to gracefully handle network maintenance reducing bandwidth > > in half? > > > > Should we set maxMBpS for affected nodes to a lower value? (default on our > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) > > > > Any other ideas or comments? > > > > Our hope is that metadata operations are not affected much and users just > > see jobs and processes read or write at a slower rate. > > > > > > > > Best, > > > > Chris > > ------------------------------ > > This message is for the recipient???s use only, and may contain > > confidential, privileged or protected information. Any unauthorized use or > > dissemination of this communication is prohibited. If you received this > > message in error, please immediately notify the sender and destroy all > > copies of this message. The recipient should check this email and any > > attachments for the presence of viruses, as we accept no liability for any > > damage caused by any virus transmitted by this email. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From alex at calicolabs.com Mon Jun 17 17:51:27 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 17 Jun 2019 09:51:27 -0700 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Hi all, My experience with MaxMBpS was in the other direction but it did make a difference. We had lots of spare network bandwith (that is, the network was not the bottleneck) and in the course of various GPFS tuning it also looked like the disks were not too busy, and the NSDs were not too busy, so bumping up the MaxMBpS improved performance and allowed GPFS to do more. Of course, this was many years ago on different GPFS version and hardware, but I think it would work in the other direction. It should also be very safe to try. Regards, Alex On Mon, Jun 17, 2019 at 9:47 AM Christopher Black wrote: > The man page indicates that maxMBpS can be used to "artificially limit how > much I/O one node can put on all of the disk servers", but it might not be > the best choice. Man page also says maxmbps is in the class of mmchconfig > changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) > and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Skylar Thompson" behalf of skylar2 at uw.edu> wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS > should > use its in-memory buffers for read prefetches and dirty writes. > > On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: > > Hi Chris, > > > > I think the next thing to double-check is when the maxMBpS change > takes > > effect. You may need to restart the nsds. Otherwise I think your > plan is > > sound. > > > > Regards, > > Alex > > > > > > On Mon, Jun 17, 2019 at 9:24 AM Christopher Black < > cblack at nygenome.org> > > wrote: > > > > > Our network team sometimes needs to take down sections of our > network for > > > maintenance. Our systems have dual paths thru pairs of switches, > but often > > > the maintenance will take down one of the two paths leaving all > our nsd > > > servers with half bandwidth. > > > > > > Some of our systems are transmitting at a higher rate than can be > handled > > > by half network (2x40Gb hosts with tx of 50Gb+). > > > > > > What can we do to gracefully handle network maintenance reducing > bandwidth > > > in half? > > > > > > Should we set maxMBpS for affected nodes to a lower value? > (default on our > > > ess appears to be maxMBpS = 30000, would I reduce this to ~4000 > for 32Gbps?) > > > > > > Any other ideas or comments? > > > > > > Our hope is that metadata operations are not affected much and > users just > > > see jobs and processes read or write at a slower rate. > > > > > > > > > > > > Best, > > > > > > Chris > > > ------------------------------ > > > This message is for the recipient???s use only, and may contain > > > confidential, privileged or protected information. Any > unauthorized use or > > > dissemination of this communication is prohibited. If you received > this > > > message in error, please immediately notify the sender and destroy > all > > > copies of this message. The recipient should check this email and > any > > > attachments for the presence of viruses, as we accept no liability > for any > > > damage caused by any virus transmitted by this email. > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 17 17:54:04 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 17 Jun 2019 16:54:04 +0000 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: Message-ID: Hi Writing from phone so excuse the typos. Assuming you have a system pool (metadata) and some other pool/s you can set limits on maintenance class that you done already and on other class that would affect all the other ops. You can add those per node or nodeclass that can be matched to what part/s of network you are working with. Changes are online and immediate. And you can measure normal load just by having QoS activated and looking into the values for few days. Hope makes some sense the above. -- Cheers > On 17 Jun 2019, at 19.48, Christopher Black wrote: > > The man page indicates that maxMBpS can be used to "artificially limit how much I/O one node can put on all of the disk servers", but it might not be the best choice. Man page also says maxmbps is in the class of mmchconfig changes take place immediately. > We've only ever used QoS for throttling maint operations (restripes, etc) and are unfamiliar with how to best use it to throttle client load. > > Best, > Chris > > ?On 6/17/19, 12:40 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Skylar Thompson" wrote: > > IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS should > use its in-memory buffers for read prefetches and dirty writes. > >> On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote: >> Hi Chris, >> >> I think the next thing to double-check is when the maxMBpS change takes >> effect. You may need to restart the nsds. Otherwise I think your plan is >> sound. >> >> Regards, >> Alex >> >> >> On Mon, Jun 17, 2019 at 9:24 AM Christopher Black >> wrote: >> >>> Our network team sometimes needs to take down sections of our network for >>> maintenance. Our systems have dual paths thru pairs of switches, but often >>> the maintenance will take down one of the two paths leaving all our nsd >>> servers with half bandwidth. >>> >>> Some of our systems are transmitting at a higher rate than can be handled >>> by half network (2x40Gb hosts with tx of 50Gb+). >>> >>> What can we do to gracefully handle network maintenance reducing bandwidth >>> in half? >>> >>> Should we set maxMBpS for affected nodes to a lower value? (default on our >>> ess appears to be maxMBpS = 30000, would I reduce this to ~4000 for 32Gbps?) >>> >>> Any other ideas or comments? >>> >>> Our hope is that metadata operations are not affected much and users just >>> see jobs and processes read or write at a slower rate. >>> >>> >>> >>> Best, >>> >>> Chris >>> ------------------------------ >>> This message is for the recipient???s use only, and may contain >>> confidential, privileged or protected information. Any unauthorized use or >>> dissemination of this communication is prohibited. If you received this >>> message in error, please immediately notify the sender and destroy all >>> copies of this message. The recipient should check this email and any >>> attachments for the presence of viruses, as we accept no liability for any >>> damage caused by any virus transmitted by this email. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= >>> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=2ioq3oT4gzOlIvyQRqkdZF0GWKv1APEBmstC48AyVdo&s=fvxPTdT1cVT7av_-vR5-3wVgjIzEpUP8OY8vGx0i5kc&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=zyyij5eDMGGtTC00mplr-3aAR3dbStZGhwocBYKIyUg&s=dlSFGfd_CW47EaNE-5X9tMCkmqZ8WayaLCGI1sTzpkA&e= > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jun 17 20:39:46 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 17 Jun 2019 15:39:46 -0400 Subject: [gpfsug-discuss] Steps for gracefully handling bandwidth reduction during network maintenance In-Reply-To: References: <20190617163847.mcmfoegbibd5ffr5@utumno.gs.washington.edu> Message-ID: Please note that the maxmbps parameter of mmchconfig is not part of the QOS features of the mmchqos command. mmchqos can be used to precisely limit IOPs. You can even set different limits for NSD traffic originating at different nodes. However, use the "force" of QOS carefully! No doubt you can bring a system to a virtual standstill if you set the IOPS values incorrectly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Jun 18 20:30:53 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 18 Jun 2019 15:30:53 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Message-ID: All, With respect to the issues (including kernel crashes) on Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just been released: https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 (as described in the link above) A fix is now available in efix form for both 4.2.3 and 5.0.x . The fix should be included in the upcoming PTFs for 4.2.3 and 5.0.3. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Wed Jun 19 00:23:37 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 18 Jun 2019 23:23:37 +0000 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: Hi We are doing a underlying hardware change that will result in the Linux device file names changing for attached storage. Hence I need to reconfigure the NSDs to use the new Linux device names. What is the best way to do this ? Thanks in advance Regards, Rob Logie IT Specialist A/NZ GBS Ballarat CIC Office: +61-3-5339 7748| Mobile: +61-411-021-029| Tie-Line: 97748 E-mail: roblogie at au1.ibm.com | Lotus Notes: Rob Logie/Australia/IBM IBM Buiilding, BA02 129 Gear Avenue, Mount Helen, Vic, 3350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 19 01:32:40 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 18 Jun 2019 20:32:40 -0400 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD In-Reply-To: References: Message-ID: <11132.1560904360@turing-police> On Tue, 18 Jun 2019 23:23:37 -0000, "Rob Logie" said: > We are doing a underlying hardware change that will result in the Linux > device file names changing for attached storage. > Hence I need to reconfigure the NSDs to use the new Linux device names. The only time GPFS cares about the Linux device names is when you go to actually create an NSD. After that, it just romps through /dev, finds anything that looks like a disk, and if it has an NSD on it at the appropriate offset, claims it as a GPFS device. (Protip: Since in a cluster the same disk may not have enumerated to the same name on all NSD servers that have visibility to it, you're almost always better off initially doing an mmcreatnsd specifying only one server, and then using mmchnsd to add the other servers to the server list for it) Heck, even without hardware changes, there's no guarantee that the disks enumerate in the same order across reboots (especially if you have a petabyte of LUNs and 8 or 16 paths to each LUN, though it's possible to tell the multipath daemon to have stable names for the multipath devices) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Jun 19 11:22:51 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 19 Jun 2019 11:22:51 +0100 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: Message-ID: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From arc at b4restore.com Wed Jun 19 12:30:33 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 19 Jun 2019 11:30:33 +0000 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From knop at us.ibm.com Wed Jun 19 13:22:40 2019 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Jun 2019 08:22:40 -0400 Subject: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available In-Reply-To: References: <9f12a616f5eb3729b5e12fa7f65478b60ac6b8e2.camel@strath.ac.uk> Message-ID: Andi, Thank you. At least from the point of view of the change in the kernel (RHBA-2019:1337) that triggered the compatibility break between the GPFS kernel module and the kernel, the GPFS efix should work with the newer kernel. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Andi Rhod Christiansen To: gpfsug main discussion list Date: 06/19/2019 07:42 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathan Here is what IBM wrote when I asked them: "the term "...node running kernel versions 3.10.0-957.19.1 or higher" includes 21.3. The term "including 3.10.0-957.21.2" is just to make clear, that the issue isnt limited to the 19.x kernel." I will receive an efix later today and try it on the 21.3 kernel. Venlig hilsen / Best Regards Andi Rhod Christiansen -----Oprindelig meddelelse----- Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Jonathan Buzzard Sendt: Wednesday, June 19, 2019 12:23 PM Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale with RHEL7.6 kernel 3.10.0-957.21.2: fix available On Tue, 2019-06-18 at 15:30 -0400, Felipe Knop wrote: > All, > > With respect to the issues (including kernel crashes) on Spectrum > Scale with RHEL7.6 kernel 3.10.0-957.21.2, an updated flash has just > been released: > https://www-01.ibm.com/support/docview.wss?uid=ibm10887729 > > (as described in the link above) A fix is now available in efix form > for both 4.2.3 and 5.0.x . The fix should be included in the upcoming > PTFs for 4.2.3 and 5.0.3. > Anyone know if it works with 3.10.0-957.21.3 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=i6sKmBjs765x8OUlvipm4PXQbXYHEZ7q27eWcfIUuA0&s=s-83FfH6qlM-yNbeFE92Xe_yMfWAGYm5ocLEKcBX3VA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From INDULISB at uk.ibm.com Wed Jun 19 13:36:26 2019 From: INDULISB at uk.ibm.com (Indulis Bernsteins1) Date: Wed, 19 Jun 2019 13:36:26 +0100 Subject: [gpfsug-discuss] Renaming Linux device used by a NSD Message-ID: You can also speed up the startup of Spectrum Scale (GPFS) by using the nsddevices exit to supplement or bypass the normal "scan all block devices" process by Spectrum Scale. Useful if you have lots of LUNs or other block devices which are not NSDs, or for multipath. Though later versions of Scale might have fixed the scan for multipath devices. Anyway, this is old but potentially useful https://mytravelingfamily.com/2009/03/03/making-gpfs-work-using-multipath-on-linux/ All the information, representations, statements, opinions and proposals in this document are correct and accurate to the best of our present knowledge but are not intended (and should not be taken) to be contractually binding unless and until they become the subject of separate, specific agreement between us. Any IBM Machines provided are subject to the Statements of Limited Warranty accompanying the applicable Machine. Any IBM Program Products provided are subject to their applicable license terms. Nothing herein, in whole or in part, shall be deemed to constitute a warranty. IBM products are subject to withdrawal from marketing and or service upon notice, and changes to product configurations, or follow-on products, may result in price changes. Any references in this document to "partner" or "partnership" do not constitute or imply a partnership in the sense of the Partnership Act 1890. IBM is not responsible for printing errors in this proposal that result in pricing or information inaccuracies. Regards, Indulis Bernsteins Systems Architect IBM New Generation Storage Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Thu Jun 20 23:18:01 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 20 Jun 2019 22:18:01 +0000 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: <0D7782FD-5594-4D9D-8B2B-B0BF22A4CB5F@oarc.rutgers.edu> Hi there, Been reading the documentation and wikis and such this afternoon, but could use some assistance from someone who is more well-versed in AFM and policy writing to confirm that what I?m looking to do is actually feasible. Is is possible to: 1) Have a policy that, generally, continuously prefetches a single fileset of an AFM cache (make sure those files are there whenever possible)? 2) Generally prefer not evict files from that fileset, unless it?s necessary, opting to evict other stuff first? It seems to me that one can do a prefetch on the fileset, but that future files will not be prefetched, requiring you to run this periodically. Additionally, by default, it would seem as if these files would frequently be evicted in the case where it becomes necessary if they are infrequently used. Would like to avoid too much churn on this but provide fast access to these files (it?s a software tree, not user files). Thanks in advance! I?d rather know that it?s possible before digging too deeply into the how. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From quixote at us.ibm.com Fri Jun 21 13:06:35 2019 From: quixote at us.ibm.com (Chris Kempin) Date: Fri, 21 Jun 2019 08:06:35 -0400 Subject: [gpfsug-discuss] AFM prefetch and eviction policy question Message-ID: Ryan: 1) You will need to just regularly run a prefetch to bring over the latest files .. you could either just run it regularly on the cache ( probably using the --directory flag to scan the whole fileset for uncached files ) or, with a little bit of scripting, you might be able to drive the prefetch from home if you know what files have been created/changed by shipping over to the cache a list of files to prefetch and have something prefetch that list when it arrives. 2) As to eviction, just set afmEnableAutoEviction=no and don't evict. is there a storage constraint on the cache that would force you to evict? I was using AFM in a more interactive application, with many small files and performance was not an issue in terms of "fast" access to files, but things to consider What is the network latency between home and cache? How big are the files you are dealing with? If you have very large files, you may want multiple gateways so they can fetch in parallel. How much change is there in the files? How many new/changed files a day are we talking about? Are existing files fairly stable? Regards, Chris Chris Kempin IBM Cloud - Site Reliability Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Tue Jun 25 12:38:28 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 11:38:28 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello, I wonder if anyone has seen this... I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I've checked the NSDs via mmlsnsd and mmlsdisk commands and they are all 'ready' and 'up'. The multipaths to these NSDs are all fine too. Is there a way of finding out what 'access' (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access - 'mmnsdrediscover' returns nothing and run really fast (contrary to the statement 'This may take a while' when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 13:10:53 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 12:10:53 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Tue Jun 25 13:01:11 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Tue, 25 Jun 2019 14:01:11 +0200 Subject: [gpfsug-discuss] Charts Decks - User Meeting along ISC Frankfurt In-Reply-To: References: Message-ID: The chart decks of the user meeting along ISC are now available: https://spectrumscale.org/presentations/ Thanks to all speaker and participants. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: gpfsug main discussion list Date: 05/06/2019 10:44 Subject: [EXTERNAL] [gpfsug-discuss] Agenda - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org The agenda is now published: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-isc/ Please use the registration link to attend. Looking forward to meet many of you there. -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 Inactive hide details for "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM "Ulf Troppens" ---22/05/2019 10:55:48---Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User From: "Ulf Troppens" To: "gpfsug main discussion list" Date: 22/05/2019 10:55 Subject: [EXTERNAL] [gpfsug-discuss] Save the date - User Meeting along ISC Frankfurt Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM will host a joint "IBM Spectrum Scale and IBM Spectrum LSF User Meeting" at ISC. As with other user group meetings, the agenda will include user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. We are still looking for customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. The meeting is planned for: Monday June 17th, 2019 - 1pm-5.30pm ISC Frankfurt, Germany I will send more details later. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=oSzGEkM6PXf5XfF3fAOrsCpqjyrt-ukWcaq3_Ldy_P4&s=GiOkq0F1T3eVSb1IeWaD7gKImm1gEVwhGaa1eIHDhD8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From son.truong at bristol.ac.uk Tue Jun 25 16:02:20 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Tue, 25 Jun 2019 15:02:20 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Hello Renar, Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. I have double checked the multipath and devices and I can see these NSDs are available locally. How do I get GPFS to recognise this and server them out via 'localhost'? mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. Any more ideas? Regards, Son --------------------------- Message: 2 Date: Tue, 25 Jun 2019 12:10:53 +0000 From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Message-ID: Content-Type: text/plain; charset="utf-8" Hallo Son, you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong Gesendet: Dienstag, 25. Juni 2019 13:38 An: gpfsug-discuss at spectrumscale.org Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." Hello, I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: 2019-06-25_06:30:48.706+0100: [I] Connected to 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:30:51.188+0100: [N] Connecting to 2019-06-25_06:30:51.195+0100: [I] Connected to 2019-06-25_06:30:59.857+0100: [N] Connecting to 2019-06-25_06:30:59.863+0100: [I] Connected to 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? Any ideas appreciated! Regards, Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 89, Issue 26 ********************************************** From janfrode at tanso.net Tue Jun 25 18:13:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 25 Jun 2019 19:13:12 +0200 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: I?ve had a situation recently where mmnsddiscover didn?t help, but mmshutdown/mmstartup on that node did fix it. This was with v5.0.2-3 on ppc64le. -jf tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic > NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs > are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic > NSDs back to being served locally, but the other two are not behaving. I > have double checked the availability of these devices and their multipaths > but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give > you a colum like ?IO performed on node?. On NSD-Server you should see > localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter > Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. > 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, informieren Sie bitte sofort den Absender und vernichten > Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht > ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information > in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD > failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the > rescan-scsi-bus.sh command especially with the -r switch. Even though there > are no devices removed the script seems to interrupt currently working NSDs > and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, > switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked > the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and > ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular > node has to an NSD? And is there a command to force it to switch to local > access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to > the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research > Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Tue Jun 25 18:21:17 2019 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 25 Jun 2019 13:21:17 -0400 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: It may depend on which state the NSDs are in with respect to the node in question. If from that node you use 'mmfsadm dump nsd | egrep "moved|error|broken" ' and see anything, that might be it. One or two of those states can be fixed by mmnsddiscover, the other(s) require a kick of mmfsd to get the NSDs back. I never remember which is which. -Jordan On Tue, Jun 25, 2019, 13:13 Jan-Frode Myklebust wrote: > I?ve had a situation recently where mmnsddiscover didn?t help, but > mmshutdown/mmstartup on that node did fix it. > > This was with v5.0.2-3 on ppc64le. > > > -jf > > tir. 25. jun. 2019 kl. 17:02 skrev Son Truong : > >> >> Hello Renar, >> >> Thanks for that command, very useful and I can now see the problematic >> NSDs are all served remotely. >> >> I have double checked the multipath and devices and I can see these NSDs >> are available locally. >> >> How do I get GPFS to recognise this and server them out via 'localhost'? >> >> mmnsddiscover -d seemed to have brought two of the four problematic >> NSDs back to being served locally, but the other two are not behaving. I >> have double checked the availability of these devices and their multipaths >> but everything on that side seems fine. >> >> Any more ideas? >> >> Regards, >> Son >> >> >> --------------------------- >> >> Message: 2 >> Date: Tue, 25 Jun 2019 12:10:53 +0000 >> From: "Grunenberg, Renar" >> To: "gpfsug-discuss at spectrumscale.org" >> >> Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to >> NSD failed with EIO, switching to access the disk remotely." >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Hallo Son, >> >> you can check the access to the nsd with mmlsdisk -m. This give >> you a colum like ?IO performed on node?. On NSD-Server you should see >> localhost, on nsd-client you see the hostig nsd-server per device. >> >> Regards Renar >> >> >> Renar Grunenberg >> Abteilung Informatik - Betrieb >> >> HUK-COBURG >> Bahnhofsplatz >> 96444 Coburg >> Telefon: 09561 96-44110 >> Telefax: 09561 96-44104 >> E-Mail: Renar.Grunenberg at huk-coburg.de >> Internet: www.huk.de >> ________________________________ >> HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter >> Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. >> 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg >> Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. >> Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans >> Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. >> ________________________________ >> Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte >> Informationen. >> Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich >> erhalten haben, informieren Sie bitte sofort den Absender und vernichten >> Sie diese Nachricht. >> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht >> ist nicht gestattet. >> >> This information may contain confidential and/or privileged information. >> If you are not the intended recipient (or have received this information >> in error) please notify the sender immediately and destroy this information. >> Any unauthorized copying, disclosure or distribution of the material in >> this information is strictly forbidden. >> ________________________________ >> Von: gpfsug-discuss-bounces at spectrumscale.org < >> gpfsug-discuss-bounces at spectrumscale.org> Im Auftrag von Son Truong >> Gesendet: Dienstag, 25. Juni 2019 13:38 >> An: gpfsug-discuss at spectrumscale.org >> Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD >> failed with EIO, switching to access the disk remotely." >> >> Hello, >> >> I wonder if anyone has seen this? I am (not) having fun with the >> rescan-scsi-bus.sh command especially with the -r switch. Even though there >> are no devices removed the script seems to interrupt currently working NSDs >> and these messages appear in the mmfs.logs: >> >> 2019-06-25_06:30:48.706+0100: [I] Connected to >> 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:30:51.188+0100: [N] Connecting to >> 2019-06-25_06:30:51.195+0100: [I] Connected to >> 2019-06-25_06:30:59.857+0100: [N] Connecting to >> 2019-06-25_06:30:59.863+0100: [I] Connected to >> 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, >> switching to access the disk remotely. >> >> These messages appear roughly at the same time each day and I?ve checked >> the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and >> ?up?. The multipaths to these NSDs are all fine too. >> >> Is there a way of finding out what ?access? (local or remote) a >> particular node has to an NSD? And is there a command to force it to switch >> to local access ? ?mmnsdrediscover? returns nothing and run really fast >> (contrary to the statement ?This may take a while? when it runs)? >> >> Any ideas appreciated! >> >> Regards, >> Son >> >> Son V Truong - Senior Storage Administrator Advanced Computing Research >> Centre IT Services, University of Bristol >> Email: son.truong at bristol.ac.uk >> Tel: Mobile: +44 (0) 7732 257 232 >> Address: 31 Great George Street, Bristol, BS1 5QD >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190625/db704f88/attachment.html >> > >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 89, Issue 26 >> ********************************************** >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jun 25 20:05:01 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 25 Jun 2019 19:05:01 +0000 Subject: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." In-Reply-To: References: Message-ID: <832868CF-82CE-457E-91C7-2488B5C03D74@huk-coburg.de> Hallo Son, Please put mmnsddiscover -a N all. Are all NSD?s had there Server stanza Definition? Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= > Am 25.06.2019 um 17:02 schrieb Son Truong : > > > Hello Renar, > > Thanks for that command, very useful and I can now see the problematic NSDs are all served remotely. > > I have double checked the multipath and devices and I can see these NSDs are available locally. > > How do I get GPFS to recognise this and server them out via 'localhost'? > > mmnsddiscover -d seemed to have brought two of the four problematic NSDs back to being served locally, but the other two are not behaving. I have double checked the availability of these devices and their multipaths but everything on that side seems fine. > > Any more ideas? > > Regards, > Son > > > --------------------------- > > Message: 2 > Date: Tue, 25 Jun 2019 12:10:53 +0000 > From: "Grunenberg, Renar" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to > NSD failed with EIO, switching to access the disk remotely." > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hallo Son, > > you can check the access to the nsd with mmlsdisk -m. This give you a colum like ?IO performed on node?. On NSD-Server you should see localhost, on nsd-client you see the hostig nsd-server per device. > > Regards Renar > > > Renar Grunenberg > Abteilung Informatik - Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444 Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > ________________________________ > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > ________________________________ > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. > ________________________________ > Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Son Truong > Gesendet: Dienstag, 25. Juni 2019 13:38 > An: gpfsug-discuss at spectrumscale.org > Betreff: [gpfsug-discuss] rescan-scsi-bus.sh and "Local access to NSD failed with EIO, switching to access the disk remotely." > > Hello, > > I wonder if anyone has seen this? I am (not) having fun with the rescan-scsi-bus.sh command especially with the -r switch. Even though there are no devices removed the script seems to interrupt currently working NSDs and these messages appear in the mmfs.logs: > > 2019-06-25_06:30:48.706+0100: [I] Connected to > 2019-06-25_06:30:48.764+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.187+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:30:51.188+0100: [N] Connecting to > 2019-06-25_06:30:51.195+0100: [I] Connected to > 2019-06-25_06:30:59.857+0100: [N] Connecting to > 2019-06-25_06:30:59.863+0100: [I] Connected to > 2019-06-25_06:33:30.134+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > 2019-06-25_06:33:30.151+0100: [E] Local access to failed with EIO, switching to access the disk remotely. > > These messages appear roughly at the same time each day and I?ve checked the NSDs via mmlsnsd and mmlsdisk commands and they are all ?ready? and ?up?. The multipaths to these NSDs are all fine too. > > Is there a way of finding out what ?access? (local or remote) a particular node has to an NSD? And is there a command to force it to switch to local access ? ?mmnsdrediscover? returns nothing and run really fast (contrary to the statement ?This may take a while? when it runs)? > > Any ideas appreciated! > > Regards, > Son > > Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 89, Issue 26 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TROPPENS at de.ibm.com Wed Jun 26 09:58:09 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 26 Jun 2019 10:58:09 +0200 Subject: [gpfsug-discuss] Meet-up in Buenos Aires Message-ID: IBM will host an ?IBM Spectrum Scale Meet-up? along IBM Technical University Buenos Aires. This is the first user meeting in South America. All sessions will be in Spanish. https://www.spectrumscale.org/event/spectrum-scale-meet-up-in-buenos-aires/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jun 26 10:17:28 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 26 Jun 2019 09:17:28 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.roth at de.ibm.com Wed Jun 26 15:48:34 2019 From: stefan.roth at de.ibm.com (Stefan Roth) Date: Wed, 26 Jun 2019 16:48:34 +0200 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch> Message-ID: Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. |-----------------+----------------------+------------------------------------------------------+-----------+> |Mit freundlichen | | | || |Gr??en / Kind | | | || |regards | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Stefan Roth | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Spectrum Scale | | | || |GUI Development | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |Phone: |+49-7034-643-1362 | IBM Deutschland | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |E-Mail: |stefan.roth at de.ibm.com| Am Weiher 24 | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | 65451 Kelsterbach | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | Germany | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> | | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| |-----------------+----------------------+------------------------------------------------------+-----------+> |IBM Deutschland | | | || |Research & | | | || |Development | | | || |GmbH / | | | || |Vorsitzender des | | | || |Aufsichtsrats: | | | || |Matthias Hartmann| | | || | | | | || |Gesch?ftsf?hrung:| | | || |Dirk Wittkopp | | | || |Sitz der | | | || |Gesellschaft: | | | || |B?blingen / | | | || |Registergericht: | | | || |Amtsgericht | | | || |Stuttgart, HRB | | | || |243294 | | | || |-----------------+----------------------+------------------------------------------------------+-----------+> >-----------------------------------------------------------------------------------------------------------------------------| | | >-----------------------------------------------------------------------------------------------------------------------------| From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| |sf-gpfs.psi.ch|sf-ems1.psi.ch|gui_refresh_task_failed|NODE |sf-ems1.psi.ch|WARNING |The following GUI refresh task(s) | | | | | | | |failed: | | | | | | | |HEALTH_TRIGGERED;HW_INVENTORY | |--------------+--------------+-----------------------+------+--------------+----------+-----------------------------------| The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=1hMcHf9tfLS9nVUCQfQf4fELpZIFL9TdA8K3SBitL-w&s=SKPyQNlbW1HgGUGioHZhTr9gNlqdqpAV2SVJew0oLX0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Fri Jun 28 08:25:24 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:25:24 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From alvise.dorigo at psi.ch Fri Jun 28 08:32:42 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 28 Jun 2019 07:32:42 +0000 Subject: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed In-Reply-To: <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452BE6A0E1@MBX214.d.ethz.ch>, , <83A6EEB0EC738F459A39439733AE80452BE6EF79@MBX214.d.ethz.ch> Message-ID: <83A6EEB0EC738F459A39439733AE80452BE6EF9B@MBX214.d.ethz.ch> ops, and I made a double mistake: Currently I've 5.0.2-1 (not -3) on my GL2, and in house we only have x86_64, so I definitely need to download specific rpm from somewhere if it is compatible with 5.0.2-1. Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Dorigo Alvise (PSI) [alvise.dorigo at psi.ch] Sent: Friday, June 28, 2019 9:25 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed The tarball 5.0.2-3 we have doesn't have the .7 neither the .9 version. And I guess I cannot install just the gpfsgui 5.0.3 on to of an installation 5.0.2-3. Should I open a case to IBM to download that specific version rpm ? thanks, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stefan Roth [stefan.roth at de.ibm.com] Sent: Wednesday, June 26, 2019 4:48 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Hello Alvise, the problem will be most-likely fixed after installing the gpfs.gui-5.0.2-3.7.noarch.rpm GUI rpm. The latest available GUI rpm for your release is 5.0.2-3.9, so I recommend to upgrade to this one. No other additional rpm packages have to be upgraded. Mit freundlichen Gr??en / Kind regards Stefan Roth Spectrum Scale GUI Development [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] Phone: +49-7034-643-1362 IBM Deutschland [cid:3__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] E-Mail: stefan.roth at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany [cid:2__=4EBB0EB6DFDC264A8f9e8a93df938690918c4EB@] IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started]"Dorigo Alvise (PSI)" ---26.06.2019 11:25:39---Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 26.06.2019 11:25 Subject: [EXTERNAL] [gpfsug-discuss] Problem with GUI reporting gui_refresh_task_failed Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, after upgrading my GL2 to ESS 5.3.2-1 I started to periodically get this warning from the GUI: sf-gpfs.psi.ch sf-ems1.psi.ch gui_refresh_task_failed NODE sf-ems1.psi.ch WARNING The following GUI refresh task(s) failed: HEALTH_TRIGGERED;HW_INVENTORY The upgrade procedure was successful and all the post-upgrade checks were also successful. Also the /usr/lpp/mmfs/gui/cli/runtask on those task is successful. Any idea about how to deeply investigate on and solve this ? Thanks, Alvise_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: ecblank.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18272088.gif Type: image/gif Size: 156 bytes Desc: 18272088.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18436932.gif Type: image/gif Size: 1851 bytes Desc: 18436932.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18298022.gif Type: image/gif Size: 63 bytes Desc: 18298022.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: