From S.J.Thompson at bham.ac.uk Mon Jun 4 12:21:36 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 11:21:36 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 16:47:01 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 11:47:01 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jun 4 16:59:47 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 15:59:47 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , Message-ID: Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From p.childs at qmul.ac.uk Mon Jun 4 22:26:25 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 4 Jun 2018 21:26:25 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , , Message-ID: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 22:48:45 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 17:48:45 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> References: , , <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> Message-ID: Peter, Simon, While I believe Power9 / RHEL 7.5 will be supported with the upcoming PTFs on 4.2.3 and 5.0.1 later in June, I'm working on getting confirmation for that. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Peter Childs To: gpfsug main discussion list Date: 06/04/2018 05:26 PM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Tue Jun 5 12:39:08 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 5 Jun 2018 11:39:08 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Message-ID: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to define different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Wed Jun 6 08:40:07 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 09:40:07 +0200 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes In-Reply-To: References: Message-ID: Hi, when it comes to clusters of this size then 150 nodes per collector rule of thumb is a good way to start. So 3-4 collector nodes should be OK for your setup. The GUI(s) can also be installed on those nodes as well. Collector nodes mainly need a good amount of RAM as all 'current' incoming sensor data is kept there. Local disk is typically not stressed heavily, plain HDD or simple onboard RAID is sufficient, plan for 20-50 GB disc space on each node. For network no special requirements are needed, default should be whatever is used in the cluster anyway. Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: David Johnson To: gpfsug main discussion list Date: 31/05/2018 20:22 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org We are planning to bring up the new ZIMon tools on our 450+ node cluster, and need to purchase new nodes to run the collector federation and GUI function on. What would you choose as a platform for this? ? memory size? ? local disk space ? SSD? shared? ? net attach ? 10Gig? 25Gig? IB? ? CPU horse power ? single or dual socket? I think I remember somebody in Cambridge UG meeting saying 150 nodes per collector as a rule of thumb, so we?re guessing a federation of 4 nodes would do it. Does this include the GUI host(s) or are those separate? Finally, we?re still using client/server based licensing model, do these nodes count as clients? Thanks, ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Wed Jun 6 09:00:06 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 10:00:06 +0200 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Message-ID: Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Wed Jun 6 10:37:02 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Wed, 6 Jun 2018 09:37:02 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch>, Message-ID: <0081EB235765E14395278B9AE1DF34180A65067E@MBX214.d.ethz.ch> Hi Norbert, thanks a lot, it worked. I tried the same before for the same rules, but it did not work. Now I realized that this was because remaining disk space and metadata was even smaller than when I checked first time, so nothing changed. Thanks a lot for your help, Marc _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Norbert Schuld [NSCHULD at de.ibm.com] Sent: Wednesday, June 06, 2018 10:00 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is repo]"Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is reporting the following alarm: From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:16:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:16:43 +0000 Subject: [gpfsug-discuss] Capacity pool filling Message-ID: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Jun 7 15:51:49 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 7 Jun 2018 10:51:49 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <6A8A18B6-8578-4C00-A8AC-8A04EF93361F@ulmer.org> > On Jun 7, 2018, at 10:16 AM, Buterbaugh, Kevin L wrote: > > Hi All, > > First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. > > However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) Any files that have been opened in that pool will have a recent atime (you?re moving them there because they have a not-recent atime, so this should be an anomaly). Further, they should have an mtime that is older than 90 days, too. You could ask the policy engine which ones have been open/written in the last day-ish and maybe see a pattern? > 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? > If you are restoring them (as opposed to recalling them), they are different files that happen to have similar contents to some other files. > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 16:08:15 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 17:08:15 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: Hm, RULE 'list_updated_in_capacity_pool' LIST 'updated_in_capacity_pool' FROM POOL 'gpfs23capacity' WHERE CURRENT_TIMESTAMP -MODIFICATION_TIME To: gpfsug main discussion list Date: 07/06/2018 16:25 Subject: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 16:56:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 15:56:34 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: Hi All, So in trying to prove Jaime wrong I proved him half right ? the cron job is stopped: #13 22 * * 5 /root/bin/gpfs_migration.sh However, I took a look in one of the restore directories under /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! So that explains why the capacity pool is filling, but mmlspolicy says: Policy for file system '/dev/gpfs23': Installed by root at gpfsmgr on Wed Jan 25 10:17:01 2017. First line of policy 'gpfs23.policy' is: RULE 'DEFAULT' SET POOL 'gpfs23data' So ? I don?t think GPFS is doing this but the next thing I am going to do is follow up with our tape software vendor ? I bet they preserve the pool attribute on files and - like Jaime said - old stuff is therefore hitting the gpfs23capacity pool. Thanks Jaime and everyone else who has responded so far? Kevin > On Jun 7, 2018, at 9:53 AM, Jaime Pinto wrote: > > I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. > > I also think you may not have stopped the crontab script as you believe you did. > > Jaime > > Quoting "Buterbaugh, Kevin L" : > >> Hi All, >> >> First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? >> >> We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. >> >> However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: >> >> 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) >> >> 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? >> >> Is there a third explanation I?m not thinking of? >> >> Thanks... >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D&reserved=0 > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D&reserved=0 - https://na01.safelinks.protection.outlook.com/?url=www.computecanada.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=JxtEYIN5%2FYiDf3GKa5ZBP3JiC27%2F%2FGiDaRbX5PnWEGU%3D&reserved=0 > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > From pinto at scinet.utoronto.ca Thu Jun 7 15:53:16 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 07 Jun 2018 10:53:16 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. I also think you may not have stopped the crontab script as you believe you did. Jaime Quoting "Buterbaugh, Kevin L" : > Hi All, > > First off, I?m on day 8 of dealing with two different > mini-catastrophes at work and am therefore very sleep deprived and > possibly missing something obvious ? with that disclaimer out of the > way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) > gpfs23data (the default pool if I run mmlspolicy), and 3) > gpfs23capacity (where files with an atime - yes atime - of more than > 90 days get migrated to by a script that runs out of cron each > weekend. > > However ? this morning the free space in the gpfs23capacity pool is > dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot > figure out why. The migration script is NOT running ? in fact, it?s > currently disabled. So I can only think of two possible > explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool > that someone has started updating. Is there a way to check for that > ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but > restricted to only files in the gpfs23capacity pool. Marc Kaplan - > can mmfind do that?? ;-) > > 2. We are doing a large volume of restores right now because one of > the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) > down due to a issue with the storage array. We?re working with the > vendor to try to resolve that but are not optimistic so we have > started doing restores in case they come back and tell us it?s not > recoverable. We did run ?mmfileid? to identify the files that have > one or more blocks on the down NSD, but there are so many that what > we?re doing is actually restoring all the files to an alternate path > (easier for out tape system), then replacing the corrupted files, > then deleting any restores we don?t need. But shouldn?t all of that > be going to the gpfs23data pool? I.e. even if we?re restoring > files that are in the gpfs23capacity pool shouldn?t the fact that > we?re restoring to an alternate path (i.e. not overwriting files > with the tape restores) and the default pool is the gpfs23data pool > mean that nothing is being restored to the gpfs23capacity pool??? > > Is there a third explanation I?m not thinking of? > > Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:45:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:45:52 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> References: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> Message-ID: <065F97AD-9C82-4B13-A519-E090CD175305@vanderbilt.edu> Hi again all, I received a direct response and am not sure whether that means the sender did not want to be identified, but they asked good questions that I wanted to answer on list? No, we do not use snapshots on this filesystem. No, we?re not using HSM ? our tape backup system is a traditional backup system not named TSM. We?ve created a top level directory in the filesystem called ?RESTORE? and are restoring everything under that ? then doing our moves / deletes of what we?ve restored ? so I *think* that means all of that should be written to the gpfs23data pool?!? On the ?plus? side, I may figure this out myself soon when someone / something starts getting I/O errors! :-O In the meantime, other ideas are much appreciated! Kevin Do you have a job that?s creating snapshots? That?s an easy one to overlook. Not sure if you are using an HSM. Any new file that gets generated should follow the default rule in ILM unless if meets a placement condition. It would only be if you?re using an HSM that files would be placed in a non-placement location pool but that is purely because the the file location has already been updated to the capacity pool. On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" > wrote: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 19:34:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 20:34:16 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: > However, I took a look in one of the restore directories under > /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > So ? I don?t think GPFS is doing this but the next thing I am > going to do is follow up with our tape software vendor ? I bet > they preserve the pool attribute on files and - like Jaime said - > old stuff is therefore hitting the gpfs23capacity pool. Hm, then the backup/restore must be doing very funny things. Usually, GPFS should rule the placement of new files, and I assume that a restore of a file, in particular under a different name, creates a new file. So, if your backup tool does override that GPFS placement, it must be very intimate with Scale :-). I'd do some list scans of the capacity pool just to see what the files appearing there from tape have in common. If it's really that these files' data were on the capacity pool at the last backup, they should not be affected by your dead NSD and a restore is in vain anyway. If that doesn't help or give no clue, then, if the data pool has some more free space, you might try to run an upward/backward migration from capacity to data . And, yeah, as GPFS tends to stripe over all NSDs, all files in data large enough plus some smaller ones would have data on your broken NSD. That's the drawback of parallelization. Maybe you'd ask the storage vendor whether they supply some more storage for the fault of their (redundant?) device to alleviate your current storage shortage ? Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 20:36:59 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 19:36:59 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 From makaplan at us.ibm.com Thu Jun 7 21:53:36 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jun 2018 16:53:36 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: If your restore software uses the gpfs_fputattrs() or gpfs_fputattrswithpathname methods, notice there are some options to control the pool. AND there is also the possibility of using the little known "RESTORE" policy rule to algorithmically control the pool selection by different criteria than the SET POOL rule. When all else fails ... Read The Fine Manual ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 06/07/2018 03:37 PM Subject: Re: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Jun 8 09:23:18 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 8 Jun 2018 10:23:18 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: Hi Kevin, gpfsug-discuss-bounces at spectrumscale.org wrote on 07/06/2018 21:36:59: > From: "Buterbaugh, Kevin L" > So our restore software lays down the metadata first, then the data. > While it has no specific knowledge of the extended attributes, it > does back them up and restore them. So the only explanation that > makes sense to me is that since the inode for the file says that the > file should be in the gpfs23capacity pool, the data gets written there. Hm, fair enough. So it seems to extract and revise information from the inodes of backed-up files (since it cannot reuse the inode number, it must do so ...). Then, you could ask your SW vendor to include a functionality like "restore using GPFS placement" (ignoring pool info from inode), "restore data to pool XY" (all data restored,, but all to pool XY) or "restore only data from pool XY" (only data originally backed up from XY, and restored to XY), and LBNL "restore only data from pool XY to pool ZZ". The tapes could still do streaming reads, but all files not matching the condition would be ignored. Is a bit more sophisticated than just copying the inode content except some fields such as inode number. OTOH, how often are restores really needed ... so it might be over the top ... > > We?ve also determined that it is actually quicker to tell > our tape system to restore all files from a fileset than to take the > time to tell it to selectively restore only certain files ? and the > same amount of tape would have to be read in either case. Given that you know where the restored files are going to in the file system, you can also craft a policy that deletes all files which are in pool Capacity and have a path into the restore area. Running that every hour should keep your capacity pool from filling up. Just the tapes need to read more, but because they do it in streaming mode, it is probably not more expensive than shoe-shining. And that could also be applied to the third data pool which also retrieves files. But maybe your script is also sufficient Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From secretary at gpfsug.org Fri Jun 8 09:53:43 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 08 Jun 2018 09:53:43 +0100 Subject: [gpfsug-discuss] Committee change Message-ID: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Jun 8 11:42:55 2018 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Jun 2018 11:42:55 +0100 Subject: [gpfsug-discuss] Committee change In-Reply-To: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: on behalf of "secretary at gpfsug.org" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list Cc: "secretary at spectrumscaleug.org" , Chair Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From colinb at mellanox.com Fri Jun 8 12:18:25 2018 From: colinb at mellanox.com (Colin Bridger) Date: Fri, 8 Jun 2018 11:18:25 +0000 Subject: [gpfsug-discuss] Committee change In-Reply-To: References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: I?d also like to wish Claire all the best as well. As a sponsor for a large number of the events, she has been so organized and easy to work with ?and arranged some great after events? so thank you! Tongue firmly in cheek, I?d also like to agree with Claire on the 1 key thing she has learnt and point her towards the Chair of Spectrum-Scale UG for his solution ? All the best Claire! Colin Colin Bridger Mellanox Technologies Mobile: +44 7917 017737 Email: colinb at mellanox.com From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Spectrum Scale User Group Chair) Sent: Friday, June 8, 2018 11:43 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Committee change On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: > on behalf of "secretary at gpfsug.org" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list > Cc: "secretary at spectrumscaleug.org" >, Chair > Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:46:26 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:46:26 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 11 11:49:46 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 11 Jun 2018 10:49:46 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:59:11 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:59:11 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jun 11 12:52:25 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 11 Jun 2018 07:52:25 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jun 11 12:56:43 2018 From: Philipp.Rehs at uni-duesseldorf.de (Philipp Helo Rehs) Date: Mon, 11 Jun 2018 13:56:43 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector Message-ID: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further?? Kind regards ?Philipp Rehs -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1786 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jun 11 13:17:04 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 12:17:04 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Fred. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 11 June 2018 12:52 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Mon Jun 11 13:46:10 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Mon, 11 Jun 2018 14:46:10 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector In-Reply-To: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> References: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Message-ID: Hello, there could be several reasons why data is not shown in the GUI. There are some knobs in the performance data collection that could prevent it. Some common things to check: 1 Are you getting data at all? Some nodes missing? Check with the CLI and expect data: mmperfmon query compareNodes cpu_user -b 3600 -n 2 Legend: ?1:???? cache-11.novalocal|CPU|cpu_user ?2:???? cache-12.novalocal|CPU|cpu_user ?3:???? cache-13.novalocal|CPU|cpu_user Row?????????? Timestamp cache-11 cache-12 cache-13 ? 1 2018-06-11-14:00:00 1.260611 9.447619 4.134019 ? 2 2018-06-11-15:00:00 1.306165 9.026577 4.062405 2. Are specific nodes missing? Check communications between sensors and collectors. 3. Is specific data missing? For Capacity date see here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guicapacityinfoissue.htm 4. How does the sensor config look like? Call mmperfmon config show Can all sensors talk to the collector registered as colCandidates? colCandidates = "cache-11.novalocal" colRedundancy = 1 You can also contact me by PN. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Philipp Helo Rehs To: gpfsug-discuss at spectrumscale.org Date: 11.06.2018 14:05 Subject: [gpfsug-discuss] GPFS-GUI and Collector Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further? Kind regards ?Philipp Rehs (See attached file: pEpkey.asc) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19393134.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/octet-stream Size: 1817 bytes Desc: not available URL: From ulmer at ulmer.org Mon Jun 11 13:47:58 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 11 Jun 2018 08:47:58 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: <1EF433B1-14DD-48AB-B1B4-07EF88E48EDF@ulmer.org> So is it better to pin with the subscription manager, or in our case to pin the kernel version with yum (because you always have something to do when the kernel changes)? What is the consensus? -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Jun 11, 2018, at 6:59 AM, Sobey, Richard A wrote: > > Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: > > [root@ ~]# subscription-manager release > Release: 7.4 > [root@ ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. > > Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! > > Cheers > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) > Sent: 11 June 2018 11:50 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > We have on our DSS-G ? > > Have you looked at: > https://access.redhat.com/solutions/238533 > > ? > > Simon > > From: on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? > > Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 11 14:52:16 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 11 Jun 2018 09:52:16 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From JRLang at uwyo.edu Mon Jun 11 16:01:48 2018 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Mon, 11 Jun 2018 15:01:48 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jun 12 11:42:32 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 12 Jun 2018 10:42:32 +0000 Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel@qmul.ac.uk> We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London From truongv at us.ibm.com Tue Jun 12 14:53:15 2018 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 12 Jun 2018 09:53:15 -0400 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: References: Message-ID: Yes, older versions of GPFS don't recognize /dev/nvme*. So you would need /var/mmfs/etc/nsddevices user exit. On newer GPFS versions, the nvme devices are also generic. So, it is good that you are using the same NSD sub-type. Cheers, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 06/12/2018 06:47 AM Subject: gpfsug-discuss Digest, Vol 77, Issue 15 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: RHEL updated to 7.5 instead of 7.4 (Felipe Knop) 2. Re: RHEL updated to 7.5 instead of 7.4 (Jeffrey R. Lang) 3. Lroc on NVME (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Mon, 11 Jun 2018 09:52:16 -0400 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="utf-8" Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.gif > ------------------------------ Message: 2 Date: Mon, 11 Jun 2018 15:01:48 +0000 From: "Jeffrey R. Lang" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="us-ascii" Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/f085e78e/attachment-0001.html > ------------------------------ Message: 3 Date: Tue, 12 Jun 2018 10:42:32 +0000 From: Peter Childs To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 15 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Tue Jun 12 23:25:53 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 12 Jun 2018 22:25:53 +0000 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Jun 13 10:10:28 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 11:10:28 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Hello, did anyone encountered an error with RHEL 7.5 kernel 3.10.0-862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? I'm getting random errors: Unknown error 521. It means EBADHANDLE. Not sure whether it is due to kernel or GPFS. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:32:44 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:32:44 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Message-ID: <1528882364.26036.3.camel@strath.ac.uk> On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From r.sobey at imperial.ac.uk Wed Jun 13 10:33:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jun 2018 09:33:49 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet however. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:37:56 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:37:56 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: <1528882676.26036.4.camel@strath.ac.uk> On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From TOMP at il.ibm.com Wed Jun 13 10:48:14 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 12:48:14 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882676.26036.4.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:07:52 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 13 Jun 2018 06:07:52 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson > On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jun 13 11:11:26 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 12:11:26 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From TOMP at il.ibm.com Wed Jun 13 11:32:28 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 13:32:28 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: Hi, :-) I explicitly used the term "same environment". The simple answer would be NO, but: While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: david_johnson at brown.edu To: gpfsug main discussion list Date: 13/06/2018 13:13 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:38:37 2018 From: david_johnson at brown.edu (David D Johnson) Date: Wed, 13 Jun 2018 06:38:37 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: So first, apologies for hijacking the thread, but this is a hot issue as we are planning 4.2.x to 5.x.y upgrade in the unspecified future, and are currently running CNFS and clustered CIFS. Those exporter nodes are in need of replacement, and I am unsure of the future status of CNFS and CIFS (are they even in 5.x?). Is there a way to roll out protocols while still offering CNFS/Clustered CIFS, and cut over when it's ready for prime time? > On Jun 13, 2018, at 6:32 AM, Tomer Perry wrote: > > Hi, > > :-) I explicitly used the term "same environment". > > The simple answer would be NO, but: > While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. > That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: david_johnson at brown.edu > To: gpfsug main discussion list > Date: 13/06/2018 13:13 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? > > -- ddj > Dave Johnson > > On Jun 13, 2018, at 5:48 AM, Tomer Perry > wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > > To: gpfsug main discussion list > > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Jun 13 15:45:44 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 17:45:44 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> Message-ID: Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed Jun 13 16:14:53 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 13 Jun 2018 15:14:53 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pprandive at rediffmail.com Thu Jun 14 15:22:09 2018 From: pprandive at rediffmail.com (Prafulla) Date: 14 Jun 2018 14:22:09 -0000 Subject: [gpfsug-discuss] =?utf-8?q?GPFS_support_for_latest_stable_release?= =?utf-8?q?_of_OpenStack_=28called_Queens_https=3A//www=2Eopenstack?= =?utf-8?q?=2Eorg/software/queens/=29?= Message-ID: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens?I have few queries around that,1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)?2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose?Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance!Regards,pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jun 14 15:56:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 14 Jun 2018 14:56:28 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: <7282ECEB-75F0-45AF-A36C-57D3B5930CBA@bham.ac.uk> That probably depends on your definition of support? Object as part of the CES stack is currently on Pike (AFAIK). If you wanted to run swift and Queens then I don?t think that would be supported as part of CES. I believe that cinder/manilla/glance integration is written by IBM developers, but I?m not sure if there was ever a formal support statement from IBM about this, (in the sense of a guaranteed support with a PMR). Simon From: on behalf of "pprandive at rediffmail.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 14 June 2018 at 15:49 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Thu Jun 14 16:04:00 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Thu, 14 Jun 2018 11:04:00 -0400 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: Brian is probably best able to answer this question. Lyle From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.b.mills at nasa.gov Thu Jun 14 16:09:57 2018 From: jonathan.b.mills at nasa.gov (Mills, Jonathan B. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 14 Jun 2018 15:09:57 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: I can?t speak for the GUI integration with Horizon, but I use GPFS 4.2.3.8 just fine with OpenStack Pike (for Glance, Cinder, and Nova). I?d be surprised if it worked any differently in Queens. From: on behalf of Lyle Gayne Reply-To: gpfsug main discussion list Date: Thursday, June 14, 2018 at 11:05 AM To: gpfsug main discussion list Cc: Brian Nelson Subject: Re: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Brian is probably best able to answer this question. Lyle [Inactive hide details for "Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the l]"Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From brnelson at us.ibm.com Fri Jun 15 04:36:19 2018 From: brnelson at us.ibm.com (Brian Nelson) Date: Thu, 14 Jun 2018 22:36:19 -0500 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: The only OpenStack component that GPFS explicitly ships is Swift, which is used for the Object protocol of the Protocols Support capability. The latest version included is Swift at the Pike release. That was first made available in the GPFS 5.0.1.0 release. The other way that GPFS can be used is as the backing store for many OpenStack components, as you can see in this table: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1ins_openstackusecase.htm The GPFS drivers for those components were written in the Liberty/Mitaka timeframe. We generally do not certify every OpenStack release against GPFS. However, we have not had any compatibility issues with later releases, and I would expect Queens to also work fine with GPFS storage. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Fri Jun 15 13:01:07 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 14:01:07 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Jun 15 15:49:14 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 15 Jun 2018 14:49:14 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Fri Jun 15 16:16:18 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 17:16:18 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Fri Jun 15 16:23:43 2018 From: chrisjscott at gmail.com (Chris Scott) Date: Fri, 15 Jun 2018 16:23:43 +0100 Subject: [gpfsug-discuss] Employment vacancy: Research Computing Specialist at University of Dundee, Scotland Message-ID: Hi All This is an employment opportunity to work with Spectrum Scale and its integration features with Spectrum Protect. Please see or forward along the following link to an employment vacancy in my team for a Research Computing Specialist here at the University of Dundee: https://www.jobs.dundee.ac.uk/fe/tpl_uod01.asp?s=4A515F4E5A565B1A&jobid=102157,4132345688&key=135360005&c=54715623342377&pagestamp=sepirmfpbecljxwhkl Cheers Chris [image: University of Dundee shield logo] *Chris Scott* Research Computing Manager School of Life Sciences, UoD IT, University of Dundee +44 (0)1382 386250 | C.Y.Scott at dundee.ac.uk [image: University of Dundee Facebook] [image: University of Dundee Twitter] [image: University of Dundee LinkedIn] [image: University of Dundee YouTube] [image: University of Dundee Instagram] [image: University of Dundee Snapchat] *One of the world's top 200 universities* Times Higher Education World University Rankings 2018 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Jun 15 16:25:50 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 15 Jun 2018 11:25:50 -0400 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Message-ID: Assuming CentOS 7.5 parallels RHEL 7.5 then you would need Spectrum Scale 4.2.3.9 because that is the release version (along with 5.0.1 PTF1) that supports RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Iban Cabrillo To: gpfsug-discuss Date: 06/15/2018 11:16 AM Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jun 15 17:17:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 15 Jun 2018 16:17:48 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <4D6C04F4-266A-47AC-BC9A-C0CA9AA2B123@bham.ac.uk> This: ?2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized? Looks like you are telling GPFS to use an MLX card that doesn?t exist on the node, this is set with verbsPorts, it?s probably not your issue here, but you are better using nodeclasses and assigning the config option to those nodeclasses that have the correct card installed (I?d also encourage you to use a fabric number, we do this even if there is only 1 fabric currently in the cluster as we?ve added other fabrics over time or over multiple locations). Have you tried using mmnetverify at all? It?s been getting better in the newer releases and will give you a good indication if you have a comms issue due to something like name resolution in addition to testing between nodes? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 15 June 2018 at 16:16 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Jun 18 11:43:38 2018 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 18 Jun 2018 12:43:38 +0200 Subject: [gpfsug-discuss] Fw: User Group Meeting at ISC2018 Frankfurt - Agenda Update Message-ID: Greetings: Here is the refined agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. If not yet done - please register here to attend so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES Looking forward to see you there, Ulf Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 What is new in ESS? Christopher Maestas (IBM) 15:00-15:15 High Capacity File Storage Oliver Kill (pro-com) 15:15-15:35 Site Report: CSCS Stefano Gorini (CSCS) 15:35-15:55 Site Report: University of Birmingham Simon Thompson (University of Birmingham) 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Deep Dive on one Spectrum Scale Feature Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 18.06.2018 12:33 ----- From: Ulf Troppens/Germany/IBM To: gpfsug-discuss at spectrumscale.org Date: 28.05.2018 09:59 Subject: User Group Meeting at ISC2018 Frankfurt Greetings: IBM is happy to announce the agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. We will finish on time to attend the opening reception. As with other user group meetings, the agenda includes user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. Please join us! To attend please register here so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES We are still looking for two customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 News from Lenovo Storage Michael Hennicke (Lenovo) 15:00-15:15 What is new in ESS? Christopher Maestas (IBM) 15:15-15:35 Customer talk 1 TBD 15:35-15:55 Customer talk 2 TBD 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Field Update Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) Looking forward to see some of you there. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From PPOD at de.ibm.com Mon Jun 18 14:59:16 2018 From: PPOD at de.ibm.com (Przemyslaw Podfigurny1) Date: Mon, 18 Jun 2018 13:59:16 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043380.png Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043381.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043382.png Type: image/png Size: 1167 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 16:53:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 15:53:51 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Mon Jun 18 17:05:35 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Mon, 18 Jun 2018 16:05:35 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment * Red Hat Enterprise Linux 5 * Red Hat Enterprise Linux 6 * Red Hat Enterprise Linux 7 * NFSv4 share being exported from an NFSv4 capable NFS server Issue * From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. * Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: * How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution * Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com * Note: * If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. * If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. * On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. * Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. * On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. * Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping * If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. * With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ * Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause * NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit * The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR|NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data->auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } * This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C * For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps * Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" * The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' * Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & * If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). * From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From chetkulk at in.ibm.com Mon Jun 18 17:20:29 2018 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Mon, 18 Jun 2018 21:50:29 +0530 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com client: e.g. RHEL NFS client; set Domain attribute in /etc/idmapd.conf file and restart idmap service. # egrep ^Domain /etc/idmapd.conf Domain = test.com # service nfs-idmap restart reference Link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/b1ladm_authconsidfornfsv4access.htm Thanks, Chetan. From: "Wilson, Neil" To: gpfsug main discussion list Date: 06/18/2018 09:35 PM Subject: Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Sent by: gpfsug-discuss-bounces at spectrumscale.org I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterprise Linux 7 NFSv4 share being exported from an NFSv4 capable NFS server Issue From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com Note: If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2| caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR| NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data-> auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$ (hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 17:56:55 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 16:56:55 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: <8B8EB415-1221-454B-A08C-5B029C4F8BF8@nuance.com> That was it, thanks! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Chetan R Kulkarni Reply-To: gpfsug main discussion list Date: Monday, June 18, 2018 at 11:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jun 18 23:21:30 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 18 Jun 2018 15:21:30 -0700 Subject: [gpfsug-discuss] Save the Date September 19-20 2018 GPFS/SS Users Group Meeting at ORNL Message-ID: <5670B56F-AF19-4A90-8BDF-24B865231EC1@lbl.gov> Hello all, There is an event being planned for the week of September 16, 2018 at Oak Ridge National Laboratory (ORNL). This GPFS/Spectrum Scale UG meeting will be in conjunction with the HPCXXL User Group. We have done events like this in the past, typically in NYC, however, with the announcement of Summit (https://www.ornl.gov/news/ornl-launches-summit-supercomputer ) and it?s 250 PB, 2.5 TB/s GPFS installaion it is an exciting time to have ORNL as the venue. Per usual, the GPFS day will be free, however, this time the event will be split across two days, Wednesday (19th) afternoon and Thursday (20th) morning This way, if you want to travel out Wednesday morning and back Thursday afternoon it?s very do-able. If you want to stay around Thursday afternoon there will be a data center tour available. There will be some additional approval processes to attend at ORNL and we?ll share those details and more in the coming weeks. If you are interested in presenting something your site is working on, please let us know. User talks are always well received. Save a space on your calendar and hope to see you there. Best, Kristy PS - We will, as usual, also have an event at SC18, more on that soon as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 15:08:09 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 14:08:09 +0000 Subject: [gpfsug-discuss] mmbackup issue Message-ID: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Hallo All, we are working since two weeks(or more) on a PMR that mmbackup has problems with the MC Class in TSM. The result is that we have defined a version exist of 5. But with each run, the policy engine generate a expire list (where the mentioned files already selected) and at the end we see only (in every case) 2 Backup versions of a file. We are at: GPFS 5.0.1.1 TSM-Server 8.1.1.0 TSM-Client 7.1.6.2 After some testing we found the reason: Our mmbackup Test is performed with vi , to change a files content and restart the next mmbackup testcycle. The Problem that we found here with the defaults in vi (set backupcopy=no, attention if no a backupcopy are generatetd) There are after each test (change of the content) the file became every time a new inode number. This behavior is the reason why the shadowfile think(or the policyengine) the old file is never existent And generate an delete request in the expire policy files for dsmc (correct me if I wrong here) . Ok vi is not the problem but we had also Applications that had the same dataset handling (as ex. SAS) At SAS the data file will updated with a xx.data.new file and after the close the xx.data.new will be renamed to the original name xx.data again. And the miss interpretation of different inodes happen again. The question now are there code in the mmbackup or in gpfs for the shadow file to check or ignore the inode change for the same file. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.holliday at crick.ac.uk Wed Jun 20 15:19:13 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 20 Jun 2018 14:19:13 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 20 15:45:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 20 Jun 2018 10:45:23 -0400 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Message-ID: <9471.1529505923@turing-police.cc.vt.edu> On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From anobre at br.ibm.com Wed Jun 20 16:11:03 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Wed, 20 Jun 2018 15:11:03 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jun 20 15:52:09 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 20 Jun 2018 10:52:09 -0400 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: <638e2070-2e99-e6dc-b843-1fd368c21bc0@nasa.gov> We've used the Windows client here @ NASA (I think we have in the neighborhood of between 15 and 20 clients). I'm guessing when you say GPFS shows no errors you've dumped waiters and grabbed dump tscomm output and that's clean? -Aaron On 6/20/18 10:19 AM, Michael Holliday wrote: > Hi All, > > We?ve being trying to get the windows system to mount GPFS.? We?ve set > the drive letter on the files system, and we can get the system added to > the GPFS cluster and showing as active. > > When we try to mount the file system ?the system just sits and does > nothing ? GPFS shows no errors or issues, there are no problems in the > log files. The firewalls are stopped and as far as we can tell it should > work. > > Does anyone have any experience with the GPFS windows client that may > help us? > > Michael > > Michael Holliday RITTech MBCS > > Senior HPC & Research Data Systems Engineer | eMedLab Operations Team > > Scientific Computing | IT&S | The Francis Crick Institute > > 1, Midland Road| London | NW1 1AT| United Kingdom > > Tel: 0203 796 3167 > > The Francis Crick Institute Limited is a registered charity in England > and Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From YARD at il.ibm.com Wed Jun 20 16:30:37 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:30:37 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Wed Jun 20 16:35:57 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:35:57 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 17:00:03 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 16:00:03 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <9471.1529505923@turing-police.cc.vt.edu> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> Message-ID: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Hallo Valdis, first thanks for the explanation we understand that, but this problem generate only 2 Version at tsm server for the same file, in the same directory. This mean that mmbackup and the .shadow... has no possibility to have for the same file in the same directory more then 2 backup versions with tsm. The native ba-client manage this. (Here are there already different inode numbers existent.) But at TSM-Server side the file that are selected at 'ba incr' are merged to the right filespace and will be binded to the mcclass >2 Version exist. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von valdis.kletnieks at vt.edu Gesendet: Mittwoch, 20. Juni 2018 16:45 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmbackup issue On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... From olaf.weiser at de.ibm.com Wed Jun 20 17:06:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Jun 2018 18:06:56 +0200 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de><9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 21 08:32:39 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 21 Jun 2018 08:32:39 +0100 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Thu Jun 21 10:18:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 21 Jun 2018 09:18:29 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: <41b590c74c314bf38111c8cc17fde764@SMXRF105.msg.hukrf.de> Hallo JAB, the main problem here is that the inode is changeing for the same file in the same directory. The mmbackup generate and execute at first the expirelist from the same file with the old inode number and afterward the selective backup for the same file with the new inode number. We want to test now to increase the version deleted Parameter here. In contrast to the ba incr in a local fs TSM make these steps in one and handle these issue. My hope now the mmbackup people can enhance these to generate a comparison list is the Filename in the selection list and the filename already in expirelist and check these first, skip these file from expire list, before the expire list will be executed. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Isom.Crawford at ibm.com Thu Jun 21 15:48:02 2018 From: Isom.Crawford at ibm.com (Isom Crawford) Date: Thu, 21 Jun 2018 09:48:02 -0500 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi Michael, It's been a while, but I've run into similar issues with Scale on Windows. One possible issue is the GPFS administrative account configuration using the following steps: ---- 1. Create a domain user with the logon name root. 2. Add user root to the Domain Admins group or to the local Administrators group on each Windows node. 3. In root Properties/Profile/Home/LocalPath, define a HOME directory such as C:\Users\root\home that does not include spaces in the path name and is not the same as the profile path. 4. Give root the right to log on as a service as described in ?Allowing the GPFS administrative account to run as a service.? Step 3 including consistent use of the HOME directory you define, is required for the Cygwin environment ---- I have botched step 3 before with the result being very similar to your experience. Carefule re-configuration of the cygwin root *home* directory fixed some of the problems. Hope this helps. Another tangle you may run into is disabling IPv6. I had to completely disable IPv6 on the Windows client by not only deselecting it on the network interface properties list, but also disabling it system-wide. The symptoms vary, but utilities like mmaddnode or mmchnode may fail due to invalid interface. Check the output of /usr/lpp/mmfs/bin/mmcmi host to be sure it's the host that Scale expects. (In my case, it returned ::1 until I completely disabled IPv6). My notes follow: This KB article tells us about a setting that affects what Windows prefers, emphasized in bold: In Registry Editor, locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6 \Parameters Double-click DisabledComponents to modify the DisabledComponents entry. Note: If the DisabledComponents entry is unavailable, you must create it. To do this, follow these steps: In the Edit menu, point to New, and then click DWORD (32-bit) Value. Type DisabledComponents, and then press ENTER. Double-click DisabledComponents. Type any one of the following values in the Value data: field to configure the IPv6 protocol to the desired state, and then click OK: Type 0 to enable all IPv6 components. (Windows default setting) Type 0xffffffff to disable all IPv6 components, except the IPv6 loopback interface. This value also configures Windows to prefer using Internet Protocol version 4 (IPv4) over IPv6 by modifying entries in the prefix policy table. For more information, see Source and Destination Address Selection. Type 0x20 to prefer IPv4 over IPv6 by modifying entries in the prefix policy table. Type 0x10 to disable IPv6 on all nontunnel interfaces (on both LAN and Point-to-Point Protocol [PPP] interfaces). Type 0x01 to disable IPv6 on all tunnel interfaces. These include Intra-Site Automatic Tunnel Addressing Protocol (ISATAP), 6to4, and Teredo. Type 0x11 to disable all IPv6 interfaces except for the IPv6 loopback interface. Restart the computer for this setting to take effect. Kind Regards, Isom L. Crawford Jr., PhD. NA SDI SME Team Software Defined Infrastructure 2700 Redwood Street Royse City, TX 75189 United States Phone: 214-707-4611 E-mail: isom.crawford at ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Jun 21 22:42:30 2018 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Thu, 21 Jun 2018 21:42:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password Message-ID: I have a test cluster I setup months ago and then did nothing with. Now I need it again but for the life of me I can't remember the admin password to the GUI. Is there an easy way to reset it under the covers? I would hate to uninstall everything and start over. I can certainly admin everything from the cli but I use it to show others some things from time to time and it doesn't make sense to do that always from the command line. Thoughts? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Fri Jun 22 03:26:55 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Fri, 22 Jun 2018 02:26:55 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:13:43 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:13:43 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jun 22 14:19:18 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 22 Jun 2018 13:19:18 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: Message-ID: <5C6312EE-A958-4CBF-9AAC-F342CE87DB70@vanderbilt.edu> Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin On Jun 22, 2018, at 8:13 AM, Oesterlin, Robert > wrote: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C46935624ea7048a9471608d5d841feb5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636652700325626997&sdata=Az9GZeDDG76lDLi02NSKYXsXK9EHy%2FT3vLAtaMrnpew%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:28:02 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:28:02 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Fri Jun 22 15:10:29 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Fri, 22 Jun 2018 10:10:29 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Yep. And nrg1-gpfs13 isn?t even a manager node anymore! > > > > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned > as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) > appointed as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) > completed take over for dataeng. > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, June 22, 2018 at 8:21 AM > *To: *gpfsug main discussion list > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] File system manager - won't > change to new node > > > > Hi Bob, > > > > Have you tried explicitly moving it to a specific manager node? That?s > what I always do ? I personally never let GPFS pick when I?m moving the > management functions for some reason. Thanks? > > > > Kevin > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jun 22 15:38:05 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 22 Jun 2018 14:38:05 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <78d4f2d963134e87af9b123891da2c47@jumptrading.com> Hi Bob, Also tracing waiters on the cluster can help you understand if there is something that is blocking this kind of operation. Beyond the command output, which is usually too terse to understand what is actually happening, do the logs on the nodes in the cluster give you any further details about the operation? Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jordan Robertson Sent: Friday, June 22, 2018 9:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] File system manager - won't change to new node Note: External Email ________________________________ Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert > wrote: Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Fri Jun 22 20:03:52 2018 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 22 Jun 2018 14:03:52 -0500 Subject: [gpfsug-discuss] mmfsadddisk command interrupted Message-ID: We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jun 22 23:11:45 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 22 Jun 2018 18:11:45 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <128279.1529705505@turing-police.cc.vt.edu> On Fri, 22 Jun 2018 13:28:02 -0000, "Oesterlin, Robert" said: > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. That's an.... "interesting".. definition of "successful".... :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 25 16:56:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 25 Jun 2018 15:56:31 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: Hallo All, here the requirement for enhancement of mmbackup. http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=121687 Please vote. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon Jun 25 20:43:49 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 25 Jun 2018 15:43:49 -0400 Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... In-Reply-To: References: <20180418115445.8603670sy6ee6fk5@support.scinet.utoronto.ca> Message-ID: <20180625154349.47520gasb6cvevhx@support.scinet.utoronto.ca> It took a while before I could get back to this issue, but I want to confirm that Marc's suggestions worked line a charm, and did exactly what I hoped for: * remove any FOR FILESET(...) specifications * mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... I didn't have to do anything else, but exclude a few filesets from the scan. Thanks Jaime Quoting "Marc A Kaplan" : > I suggest you remove any FOR FILESET(...) specifications from your rules > and then run > > mmapplypolicy > /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan > ... --scope inodespace -P your-policy-rules-file ... > > See also the (RTFineM) for the --scope option and the Directory argument > of the mmapplypolicy command. > > That is the best, most efficient way to scan all the files that are in a > particular inode-space. Also, you must have all filesets of interest > "linked" and the file system must be mounted. > > Notice that "independent" means that the fileset name is used to denote > both a fileset and an inode-space, where said inode-space contains the > fileset of that name and possibly other "dependent" filesets... > > IF one wished to search the entire file system for files within several > different filesets, one could use rules with > > FOR FILESET('fileset1','fileset2','and-so-on') > > Or even more flexibly > > WHERE FILESET_NAME LIKE 'sql-like-pattern-with-%s-and-maybe-_s' > > Or even more powerfully > > WHERE regex(FILESET_NAME, 'extended-regular-.*-expression') > > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 04/18/2018 01:00 PM > Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > A few months ago I asked about limits and dynamics of traversing > depended .vs independent filesets on this forum. I used the > information provided to make decisions and setup our new DSS based > gpfs storage system. Now I have a problem I couldn't' yet figure out > how to make it work: > > 'project' and 'scratch' are top *independent* filesets of the same > file system. > > 'proj1', 'proj2' are dependent filesets nested under 'project' > 'scra1', 'scra2' are dependent filesets nested under 'scratch' > > I would like to run a purging policy on all contents under 'scratch' > (which includes 'scra1', 'scra2'), and TSM backup policies on all > contents under 'project' (which includes 'proj1', 'proj2'). > > HOWEVER: > When I run the purging policy on the whole gpfs device (with both > 'project' and 'scratch' filesets) > > * if I use FOR FILESET('scratch') on the list rules, the 'scra1' and > 'scra2' filesets under scratch are excluded (totally unexpected) > > * if I use FOR FILESET('scra1') I get error that scra1 is dependent > fileset (Ok, that is expected) > > * if I use /*FOR FILESET('scratch')*/, all contents under 'project', > 'proj1', 'proj2' are traversed as well, and I don't want that (it > takes too much time) > > * if I use /*FOR FILESET('scratch')*/, and instead of the whole device > I apply the policy to the /scratch mount point only, the policy still > traverses all the content of 'project', 'proj1', 'proj2', which I > don't want. (again, totally unexpected) > > QUESTION: > > How can I craft the syntax of the mmapplypolicy in combination with > the RULE filters, so that I can traverse all the contents under the > 'scratch' independent fileset, including the nested dependent filesets > 'scra1','scra2', and NOT traverse the other independent filesets at > all (since this takes too much time)? > > Thanks > Jaime > > > PS: FOR FILESET('scra*') does not work. > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=IpwHlr0YNr7rgV7gI8Y2sxIELLIwA15KK4nBnv9BYWk&e= > > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=aff0vMJkKd-Z3pw3-jckmI3ejqXh8aSr8rxkKf3OGdk&e= > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From erich at uw.edu Tue Jun 26 00:20:35 2018 From: erich at uw.edu (Eric Horst) Date: Mon, 25 Jun 2018 16:20:35 -0700 Subject: [gpfsug-discuss] mmchconfig subnets Message-ID: Hi, I'm hoping somebody has insights into how the subnets option actually works. I've read the docs a dozen times and I want to make sure I understand before I take my production cluster down to make the changes. On the current cluster the daemon addresses are on a gpfs private network and the admin addresses are on a public network. I'm changing so both daemon and admin are public and the subnets option is used to utilize the private network. This is to facilitate remote mounts to an independent cluster. The confusing factor in my case, not covered in the docs, is that the gpfs private network is subnetted and static routes are used to reach them. That is, there are three private networks, one for each datacenter and the cluster nodes daemon interfaces are spread between the three. 172.16.141.32/27 172.16.141.24/29 172.16.141.128/27 A router connects these three networks but are otherwise 100% private. For my mmchconfig subnets command should I use this? mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" Where I get confused is that I'm trying to reason through how Spectrum Scale is utilizing the subnets setting to decide if this will have the desired result on my cluster. If I change the node addresses to their public addresses, ie the private addresses are not explicitly configured in Scale, then how are the private addresses discovered? Does each node use the subnets option to identify that it has a private address and then dynamically shares that with the cluster? Thanks in advance for your clarifying comments. -Eric -- Eric Horst University of Washington -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 01:58:53 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Mon, 25 Jun 2018 18:58:53 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 Message-ID: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem?? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR From knop at us.ibm.com Tue Jun 26 04:36:43 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 25 Jun 2018 23:36:43 -0400 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joey, The subblocks-per-full-block value cannot be specified when the file system is created, but is rather computed automatically by GPFS. In file systems with format older than 5.0, the value is fixed at 32. For file systems with format 5.0.0 or later, the value is computed based on the block size. See manpage for mmcrfs, in table where the -B BlockSize option is explained. (Table 1. Block sizes and subblock sizes) . Say, for the default (in 5.0+) 4MB block size, the subblock size is 8KB. The minimum "practical" subblock size is 4KB, to keep 4KB-alignment to accommodate 4KN devices. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Joseph Mendoza To: gpfsug main discussion list Date: 06/25/2018 08:59 PM Subject: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Tue Jun 26 07:21:26 2018 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 26 Jun 2018 08:21:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 16:18:01 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Tue, 26 Jun 2018 09:18:01 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block > size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb.? > is this setup for a traditional NSD Setup or for GNR as the recommendations would be different.? > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza > wrote: > > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem?? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small).? This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0?? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag??????????????? value??????????????????? description > ------------------- ------------------------ > ----------------------------------- > ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) > size in bytes (system pool) > ??????????????????? 131072?????????????????? Minimum fragment (subblock) > size in bytes (other pools) > ?-i???????????????? 4096???????????????????? Inode size in bytes > ?-I???????????????? 32768??????????????????? Indirect block size in bytes > > ?-B???????????????? 524288?????????????????? Block size (system pool) > ??????????????????? 8388608????????????????? Block size (other pools) > > ?-V???????????????? 19.01 (5.0.1.0)????????? File system version > > ?--subblocks-per-full-block 64?????????????? Number of subblocks per > full block > ?-P???????????????? system;DATA????????????? Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Jun 26 16:32:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 26 Jun 2018 15:32:55 +0000 Subject: [gpfsug-discuss] mmchconfig subnets In-Reply-To: References: Message-ID: <20180626153255.d4sftfljwusa6yrg@utumno.gs.washington.edu> My understanding is that GPFS uses the network configuration on each node to determine netmask. The subnets option can be applied to specific nodes or groups of nodes with "mmchconfig subnets=... -N ", so what you're doing is specificy the preferred subnets for GPFS node communication, just for that list of nodes. For instance, we have four GPFS clusters, with three subnets: * eichler-cluster, eichler-cluster2 (10.130.0.0/16) * grc-cluster (10.200.0.0/16) * gs-cluster (10.110.0.0/16) And one data transfer system weasel that is a member of gs-cluster, but provides transfer services to all the clusters, and has an IP address on each subnet to avoid a bunch of network cross-talk. Its subnets setting looks like this: [weasel] subnets 10.130.0.0/eichler-cluster*.grid.gs.washington.edu 10.200.0.0/grc-cluster.grid.gs.washington.edu 10.110.0.0/gs-cluster.grid.gs.washington.edu Of course, there's some policy routing too to keep replies on the right interface as well, but that's the extent of the GPFS configuration. On Mon, Jun 25, 2018 at 04:20:35PM -0700, Eric Horst wrote: > Hi, I'm hoping somebody has insights into how the subnets option actually > works. I've read the docs a dozen times and I want to make sure I > understand before I take my production cluster down to make the changes. > > On the current cluster the daemon addresses are on a gpfs private network > and the admin addresses are on a public network. I'm changing so both > daemon and admin are public and the subnets option is used to utilize the > private network. This is to facilitate remote mounts to an independent > cluster. > > The confusing factor in my case, not covered in the docs, is that the gpfs > private network is subnetted and static routes are used to reach them. That > is, there are three private networks, one for each datacenter and the > cluster nodes daemon interfaces are spread between the three. > > 172.16.141.32/27 > 172.16.141.24/29 > 172.16.141.128/27 > > A router connects these three networks but are otherwise 100% private. > > For my mmchconfig subnets command should I use this? > > mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" > > Where I get confused is that I'm trying to reason through how Spectrum > Scale is utilizing the subnets setting to decide if this will have the > desired result on my cluster. If I change the node addresses to their > public addresses, ie the private addresses are not explicitly configured in > Scale, then how are the private addresses discovered? Does each node use > the subnets option to identify that it has a private address and then > dynamically shares that with the cluster? > > Thanks in advance for your clarifying comments. > > -Eric > > -- > > Eric Horst > University of Washington > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From r.sobey at imperial.ac.uk Wed Jun 27 11:47:02 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 10:47:02 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed Message-ID: Hi all, I'm getting the following error in the GUI, running 5.0.1: "The following GUI refresh task(s) failed: PM_MONITOR". As yet, this is the only node I've upgraded to 5.0.1 - the rest are running (healthily, according to the GUI) 4.2.3.7. I'm not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I've completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 27 12:29:19 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 27 Jun 2018 11:29:19 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: Message-ID: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingo.altenburger at id.ethz.ch Wed Jun 27 12:45:29 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Wed, 27 Jun 2018 11:45:29 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Message-ID: Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 13:28:50 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 12:28:50 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> References: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Wed Jun 27 13:49:38 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Wed, 27 Jun 2018 12:49:38 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 14:14:59 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 13:14:59 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Andreas, Output of the debug log ? no clue, but maybe you can interpret it better ? [root at icgpfsq1 ~]# /usr/lpp/mmfs/gui/cli/runtask pm_monitor --debug debug: locale=en_US debug: Raising event: gui_pmcollector_connection_ok, for node: localhost.localdomain err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more debug: Will not raise the following event using 'mmsysmonc' since it already exists in the database: reportingNode = 'icgpfsq1', eventName = 'gui_refresh_task_failed', entityId = '11', arguments = 'PM_MONITOR', identifier = 'null' err: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was unsuccessful. at com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:69) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) EFSSG1150C Running specified task was unsuccessful. Thanks Richard From: Andreas Koeninger [mailto:andreas.koeninger at de.ibm.com] Sent: 27 June 2018 13:50 To: Sobey, Richard A Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hi Richard, if you double-click the event there should be some additional help available. The steps under "User Action" will hopefully help to identify the root cause: 1.) Check if there is additional information available by executing '/usr/lpp/mmfs/gui/cli/lstasklog [taskname]'. 2.) Run the specified task manually on the CLI by executing '/usr/lpp/mmfs/gui/cli/runtask [taskname] --debug'. ... Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Date: Wed, Jun 27, 2018 2:29 PM Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jun 27 18:53:39 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 27 Jun 2018 17:53:39 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Jun 27 19:09:40 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:09:40 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb From S.J.Thompson at bham.ac.uk Wed Jun 27 19:33:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 27 Jun 2018 18:33:28 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] Sent: 27 June 2018 19:09 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cabrillo at ifca.unican.es Wed Jun 27 20:24:28 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jun 2018 21:24:28 +0200 (CEST) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> An HTML attachment was scrubbed... URL: From renata at SLAC.STANFORD.EDU Wed Jun 27 19:54:47 2018 From: renata at SLAC.STANFORD.EDU (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:54:47 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > >https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From renata at slac.stanford.edu Wed Jun 27 20:30:33 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 12:30:33 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> References: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> Message-ID: Hi, any gpfs commands fail with: root at ocio-gpu01 ~]# mmlsmgr get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsmgr: Command failed. Examine previous error messages to determine cause. The two "working" nodes are arbitrating. Also, they are using ccr, so doesn't that mean the primary/secondary setup for a client cluster doesn't apply? Renata On Wed, 27 Jun 2018, Iban Cabrillo wrote: >Hi,? ? Have you check if there is any manager node available?? >#mmlsmgr > >If not could you try to asig a new cluster/gpfs_fs manager. > >Mmchmgr? ? gpfs_fs. Manager_node >Mmchmgr.? ?-c.? Cluster_manager_node > >Cheers.? > > From scale at us.ibm.com Wed Jun 27 22:14:23 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 27 Jun 2018 17:14:23 -0400 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Renata, You may want to reduce the set of quorum nodes. If your version supports the --force option, you can run mmchnode --noquorum -N --force It is a good idea to configure tiebreaker disks in a cluster that has only 2 quorum nodes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Renata Maria Dart To: gpfsug-discuss at spectrumscale.org Date: 06/27/2018 02:21 PM Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kevindjo at us.ibm.com Wed Jun 27 22:20:41 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 27 Jun 2018 21:20:41 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From spectrumscale at kiranghag.com Thu Jun 28 04:14:30 2018 From: spectrumscale at kiranghag.com (KG) Date: Thu, 28 Jun 2018 08:44:30 +0530 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Can you also check the time differences between nodes? We had a situation recently where the server time mismatch caused failures. On Thu, Jun 28, 2018 at 2:50 AM, Kevin D Johnson wrote: > You can also try to convert to the old primary/secondary model to back it > away from the default CCR configuration. > > mmchcluster --ccr-disable -p servername > > Then, temporarily go with only one quorum node and add more once the > cluster comes back up. Once the cluster is back up and has at least two > quorum nodes, do a --ccr-enable with the mmchcluster command. > > Kevin D. Johnson > Spectrum Computing, Senior Managing Consultant > MBA, MAcc, MS Global Technology and Development > IBM Certified Technical Specialist Level 2 Expert > > [image: IBM Certified Technical Specialist Level 2 Expert] > > Certified Deployment Professional - Spectrum Scale > Certified Solution Advisor - Spectrum Computing > Certified Solution Architect - Spectrum Storage Solutions > > > 720.349.6199 - kevindjo at us.ibm.com > > "To think is to achieve." - Thomas J. Watson, Sr. > > > > > ----- Original message ----- > From: "IBM Spectrum Scale" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: renata at slac.stanford.edu, gpfsug main discussion list < > gpfsug-discuss at spectrumscale.org> > Cc: > Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Date: Wed, Jun 27, 2018 5:15 PM > > > Hi Renata, > > You may want to reduce the set of quorum nodes. If your version supports > the --force option, you can run > > mmchnode --noquorum -N --force > > It is a good idea to configure tiebreaker disks in a cluster that has only > 2 quorum nodes. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for Renata Maria Dart ---06/27/2018 02:21:52 > PM---Hi, we have a client cluster of 4 nodes with 3 quorum n]Renata Maria > Dart ---06/27/2018 02:21:52 PM---Hi, we have a client cluster of 4 nodes > with 3 quorum nodes. One of the quorum nodes is no longer i > > From: Renata Maria Dart > To: gpfsug-discuss at spectrumscale.org > Date: 06/27/2018 02:21 PM > Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the > quorum nodes is no longer in service and the other was reinstalled with > a newer OS, both without informing the gpfs admins. Gpfs is still > "working" on the two remaining nodes, that is, they continue to have access > to the gpfs data on the remote clusters. But, I can no longer get > any gpfs commands to work. On one of the 2 nodes that are still serving > data, > > root at ocio-gpu01 ~]# mmlscluster > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlscluster: Command failed. Examine previous error messages to determine > cause. > > > On the reinstalled node, this fails in the same way: > > [root at ocio-gpu02 ccr]# mmstartup > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine > cause. > > > I have looked through the users group interchanges but didn't find anything > that seems to fit this scenario. > > Is there a way to salvage this cluster? Can it be done without > shutting gpfs down on the 2 nodes that continue to work? > > Thanks for any advice, > > Renata Dart > SLAC National Accelerator Lb > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From ingo.altenburger at id.ethz.ch Thu Jun 28 07:37:48 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Thu, 28 Jun 2018 06:37:48 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments In-Reply-To: References: Message-ID: I have to note that we use the from-SONAS-imported snapshot scheduler as part of the gui to create (and keep/delete) the snapshots. When performing mmcrsnapshot @2018-06-27-14-01 -j then this snapshot is visible in MacOS Finder but not in Windows 'previous versions'. Thus, the issue might be related to the way the scheduler is creating snapshots. Since having hundreds of filesets we need snapshots for, doing the scheduling by ourselves is not trivial and a preferred option. Regards Ingo Altenburger From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Altenburger Ingo (ID SD) Sent: Mittwoch, 27. Juni 2018 13:45 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 28 08:44:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jun 2018 09:44:16 +0200 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From alvise.dorigo at psi.ch Thu Jun 28 09:02:07 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 08:02:07 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Message-ID: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jun 28 09:15:46 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 28 Jun 2018 08:15:46 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 28 09:26:41 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 28 Jun 2018 09:26:41 +0100 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: <1530174401.26036.55.camel@strath.ac.uk> On Wed, 2018-06-27 at 17:53 +0000, Christof Schmitt wrote: > Hi, > ? > we currently support the SMB protocol method of quering snapshots, > which is used by the Windows "Previous versions" dialog. Mac clients > unfortunately do not implement these explicit queries. Browsing the > snapshot directories with the @GMT names through SMB currently is not > supported. > ? > Could you open a RFE to request snapshot browsing from Mac clients? > An official request would be helpful in prioritizing the development > and test work required to support this. > ? Surely the lack of previous versions in the Mac Finder is an issue for Apple to fix??As such an RFE with IBM is not going to help and good look getting Apple to lift a finger. Similar for the various Linux file manager programs, though in this case being open source at least IBM could contribute code to fix the issue. However it occurs to me that a solution might be to run the Windows Explorer under Wine on the above platforms. Obviously licensing issues may well make that problematic, but perhaps the ReactOS clone of Windows Explorer supports the "Previous versions" feature, and if not it could be expanded to do so. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From alvise.dorigo at psi.ch Thu Jun 28 10:39:35 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 09:39:35 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526727D32@MBX114.d.ethz.ch> Hi Andrew, thanks for the naswer. No, the port #2 (on all the nodes) is not cabled. Regards, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andrew Beattie [abeattie at au1.ibm.com] Sent: Thursday, June 28, 2018 10:15 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dancasali at us.ibm.com Thu Jun 28 21:14:51 2018 From: dancasali at us.ibm.com (Daniel De souza casali) Date: Thu, 28 Jun 2018 16:14:51 -0400 Subject: [gpfsug-discuss] Sending logs to Logstash Message-ID: Good Afternoon! Does anyone here in the community send mmfs.log to Logstash? If so what do you use? Thank you! Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Fri Jun 29 16:26:51 2018 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Fri, 29 Jun 2018 15:26:51 +0000 Subject: [gpfsug-discuss] Job vacancy - Senior Research Data Storage Technologist, UCL Message-ID: Dear all, University College London are looking to appoint a Senior Research Data Storage Technologist to join their Research Data Services Team in central London. The role will involve the design and deployment of storage technologies to support research, as well as providing guidance on service development and advising research projects. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is currently developing a new institutional data repository for long-term curation and preservation. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the creation and operation of these services. For further particulars and the application form, please visit https://www.interquestgroup.com/p/join-a-world-class-workforce-at-ucl The application process will be closing shortly: deadline is 1st July 2018. Kind regards Alastair -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services RITS, UCL -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 4 12:21:36 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 11:21:36 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 16:47:01 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 11:47:01 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jun 4 16:59:47 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 15:59:47 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , Message-ID: Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From p.childs at qmul.ac.uk Mon Jun 4 22:26:25 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 4 Jun 2018 21:26:25 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , , Message-ID: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 22:48:45 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 17:48:45 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> References: , , <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> Message-ID: Peter, Simon, While I believe Power9 / RHEL 7.5 will be supported with the upcoming PTFs on 4.2.3 and 5.0.1 later in June, I'm working on getting confirmation for that. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Peter Childs To: gpfsug main discussion list Date: 06/04/2018 05:26 PM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Tue Jun 5 12:39:08 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 5 Jun 2018 11:39:08 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Message-ID: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to define different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Wed Jun 6 08:40:07 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 09:40:07 +0200 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes In-Reply-To: References: Message-ID: Hi, when it comes to clusters of this size then 150 nodes per collector rule of thumb is a good way to start. So 3-4 collector nodes should be OK for your setup. The GUI(s) can also be installed on those nodes as well. Collector nodes mainly need a good amount of RAM as all 'current' incoming sensor data is kept there. Local disk is typically not stressed heavily, plain HDD or simple onboard RAID is sufficient, plan for 20-50 GB disc space on each node. For network no special requirements are needed, default should be whatever is used in the cluster anyway. Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: David Johnson To: gpfsug main discussion list Date: 31/05/2018 20:22 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org We are planning to bring up the new ZIMon tools on our 450+ node cluster, and need to purchase new nodes to run the collector federation and GUI function on. What would you choose as a platform for this? ? memory size? ? local disk space ? SSD? shared? ? net attach ? 10Gig? 25Gig? IB? ? CPU horse power ? single or dual socket? I think I remember somebody in Cambridge UG meeting saying 150 nodes per collector as a rule of thumb, so we?re guessing a federation of 4 nodes would do it. Does this include the GUI host(s) or are those separate? Finally, we?re still using client/server based licensing model, do these nodes count as clients? Thanks, ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Wed Jun 6 09:00:06 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 10:00:06 +0200 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Message-ID: Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Wed Jun 6 10:37:02 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Wed, 6 Jun 2018 09:37:02 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch>, Message-ID: <0081EB235765E14395278B9AE1DF34180A65067E@MBX214.d.ethz.ch> Hi Norbert, thanks a lot, it worked. I tried the same before for the same rules, but it did not work. Now I realized that this was because remaining disk space and metadata was even smaller than when I checked first time, so nothing changed. Thanks a lot for your help, Marc _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Norbert Schuld [NSCHULD at de.ibm.com] Sent: Wednesday, June 06, 2018 10:00 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is repo]"Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is reporting the following alarm: From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:16:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:16:43 +0000 Subject: [gpfsug-discuss] Capacity pool filling Message-ID: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Jun 7 15:51:49 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 7 Jun 2018 10:51:49 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <6A8A18B6-8578-4C00-A8AC-8A04EF93361F@ulmer.org> > On Jun 7, 2018, at 10:16 AM, Buterbaugh, Kevin L wrote: > > Hi All, > > First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. > > However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) Any files that have been opened in that pool will have a recent atime (you?re moving them there because they have a not-recent atime, so this should be an anomaly). Further, they should have an mtime that is older than 90 days, too. You could ask the policy engine which ones have been open/written in the last day-ish and maybe see a pattern? > 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? > If you are restoring them (as opposed to recalling them), they are different files that happen to have similar contents to some other files. > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 16:08:15 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 17:08:15 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: Hm, RULE 'list_updated_in_capacity_pool' LIST 'updated_in_capacity_pool' FROM POOL 'gpfs23capacity' WHERE CURRENT_TIMESTAMP -MODIFICATION_TIME To: gpfsug main discussion list Date: 07/06/2018 16:25 Subject: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 16:56:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 15:56:34 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: Hi All, So in trying to prove Jaime wrong I proved him half right ? the cron job is stopped: #13 22 * * 5 /root/bin/gpfs_migration.sh However, I took a look in one of the restore directories under /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! So that explains why the capacity pool is filling, but mmlspolicy says: Policy for file system '/dev/gpfs23': Installed by root at gpfsmgr on Wed Jan 25 10:17:01 2017. First line of policy 'gpfs23.policy' is: RULE 'DEFAULT' SET POOL 'gpfs23data' So ? I don?t think GPFS is doing this but the next thing I am going to do is follow up with our tape software vendor ? I bet they preserve the pool attribute on files and - like Jaime said - old stuff is therefore hitting the gpfs23capacity pool. Thanks Jaime and everyone else who has responded so far? Kevin > On Jun 7, 2018, at 9:53 AM, Jaime Pinto wrote: > > I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. > > I also think you may not have stopped the crontab script as you believe you did. > > Jaime > > Quoting "Buterbaugh, Kevin L" : > >> Hi All, >> >> First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? >> >> We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. >> >> However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: >> >> 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) >> >> 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? >> >> Is there a third explanation I?m not thinking of? >> >> Thanks... >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D&reserved=0 > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D&reserved=0 - https://na01.safelinks.protection.outlook.com/?url=www.computecanada.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=JxtEYIN5%2FYiDf3GKa5ZBP3JiC27%2F%2FGiDaRbX5PnWEGU%3D&reserved=0 > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > From pinto at scinet.utoronto.ca Thu Jun 7 15:53:16 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 07 Jun 2018 10:53:16 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. I also think you may not have stopped the crontab script as you believe you did. Jaime Quoting "Buterbaugh, Kevin L" : > Hi All, > > First off, I?m on day 8 of dealing with two different > mini-catastrophes at work and am therefore very sleep deprived and > possibly missing something obvious ? with that disclaimer out of the > way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) > gpfs23data (the default pool if I run mmlspolicy), and 3) > gpfs23capacity (where files with an atime - yes atime - of more than > 90 days get migrated to by a script that runs out of cron each > weekend. > > However ? this morning the free space in the gpfs23capacity pool is > dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot > figure out why. The migration script is NOT running ? in fact, it?s > currently disabled. So I can only think of two possible > explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool > that someone has started updating. Is there a way to check for that > ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but > restricted to only files in the gpfs23capacity pool. Marc Kaplan - > can mmfind do that?? ;-) > > 2. We are doing a large volume of restores right now because one of > the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) > down due to a issue with the storage array. We?re working with the > vendor to try to resolve that but are not optimistic so we have > started doing restores in case they come back and tell us it?s not > recoverable. We did run ?mmfileid? to identify the files that have > one or more blocks on the down NSD, but there are so many that what > we?re doing is actually restoring all the files to an alternate path > (easier for out tape system), then replacing the corrupted files, > then deleting any restores we don?t need. But shouldn?t all of that > be going to the gpfs23data pool? I.e. even if we?re restoring > files that are in the gpfs23capacity pool shouldn?t the fact that > we?re restoring to an alternate path (i.e. not overwriting files > with the tape restores) and the default pool is the gpfs23data pool > mean that nothing is being restored to the gpfs23capacity pool??? > > Is there a third explanation I?m not thinking of? > > Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:45:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:45:52 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> References: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> Message-ID: <065F97AD-9C82-4B13-A519-E090CD175305@vanderbilt.edu> Hi again all, I received a direct response and am not sure whether that means the sender did not want to be identified, but they asked good questions that I wanted to answer on list? No, we do not use snapshots on this filesystem. No, we?re not using HSM ? our tape backup system is a traditional backup system not named TSM. We?ve created a top level directory in the filesystem called ?RESTORE? and are restoring everything under that ? then doing our moves / deletes of what we?ve restored ? so I *think* that means all of that should be written to the gpfs23data pool?!? On the ?plus? side, I may figure this out myself soon when someone / something starts getting I/O errors! :-O In the meantime, other ideas are much appreciated! Kevin Do you have a job that?s creating snapshots? That?s an easy one to overlook. Not sure if you are using an HSM. Any new file that gets generated should follow the default rule in ILM unless if meets a placement condition. It would only be if you?re using an HSM that files would be placed in a non-placement location pool but that is purely because the the file location has already been updated to the capacity pool. On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" > wrote: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 19:34:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 20:34:16 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: > However, I took a look in one of the restore directories under > /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > So ? I don?t think GPFS is doing this but the next thing I am > going to do is follow up with our tape software vendor ? I bet > they preserve the pool attribute on files and - like Jaime said - > old stuff is therefore hitting the gpfs23capacity pool. Hm, then the backup/restore must be doing very funny things. Usually, GPFS should rule the placement of new files, and I assume that a restore of a file, in particular under a different name, creates a new file. So, if your backup tool does override that GPFS placement, it must be very intimate with Scale :-). I'd do some list scans of the capacity pool just to see what the files appearing there from tape have in common. If it's really that these files' data were on the capacity pool at the last backup, they should not be affected by your dead NSD and a restore is in vain anyway. If that doesn't help or give no clue, then, if the data pool has some more free space, you might try to run an upward/backward migration from capacity to data . And, yeah, as GPFS tends to stripe over all NSDs, all files in data large enough plus some smaller ones would have data on your broken NSD. That's the drawback of parallelization. Maybe you'd ask the storage vendor whether they supply some more storage for the fault of their (redundant?) device to alleviate your current storage shortage ? Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 20:36:59 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 19:36:59 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 From makaplan at us.ibm.com Thu Jun 7 21:53:36 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jun 2018 16:53:36 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: If your restore software uses the gpfs_fputattrs() or gpfs_fputattrswithpathname methods, notice there are some options to control the pool. AND there is also the possibility of using the little known "RESTORE" policy rule to algorithmically control the pool selection by different criteria than the SET POOL rule. When all else fails ... Read The Fine Manual ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 06/07/2018 03:37 PM Subject: Re: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Jun 8 09:23:18 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 8 Jun 2018 10:23:18 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: Hi Kevin, gpfsug-discuss-bounces at spectrumscale.org wrote on 07/06/2018 21:36:59: > From: "Buterbaugh, Kevin L" > So our restore software lays down the metadata first, then the data. > While it has no specific knowledge of the extended attributes, it > does back them up and restore them. So the only explanation that > makes sense to me is that since the inode for the file says that the > file should be in the gpfs23capacity pool, the data gets written there. Hm, fair enough. So it seems to extract and revise information from the inodes of backed-up files (since it cannot reuse the inode number, it must do so ...). Then, you could ask your SW vendor to include a functionality like "restore using GPFS placement" (ignoring pool info from inode), "restore data to pool XY" (all data restored,, but all to pool XY) or "restore only data from pool XY" (only data originally backed up from XY, and restored to XY), and LBNL "restore only data from pool XY to pool ZZ". The tapes could still do streaming reads, but all files not matching the condition would be ignored. Is a bit more sophisticated than just copying the inode content except some fields such as inode number. OTOH, how often are restores really needed ... so it might be over the top ... > > We?ve also determined that it is actually quicker to tell > our tape system to restore all files from a fileset than to take the > time to tell it to selectively restore only certain files ? and the > same amount of tape would have to be read in either case. Given that you know where the restored files are going to in the file system, you can also craft a policy that deletes all files which are in pool Capacity and have a path into the restore area. Running that every hour should keep your capacity pool from filling up. Just the tapes need to read more, but because they do it in streaming mode, it is probably not more expensive than shoe-shining. And that could also be applied to the third data pool which also retrieves files. But maybe your script is also sufficient Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From secretary at gpfsug.org Fri Jun 8 09:53:43 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 08 Jun 2018 09:53:43 +0100 Subject: [gpfsug-discuss] Committee change Message-ID: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Jun 8 11:42:55 2018 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Jun 2018 11:42:55 +0100 Subject: [gpfsug-discuss] Committee change In-Reply-To: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: on behalf of "secretary at gpfsug.org" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list Cc: "secretary at spectrumscaleug.org" , Chair Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From colinb at mellanox.com Fri Jun 8 12:18:25 2018 From: colinb at mellanox.com (Colin Bridger) Date: Fri, 8 Jun 2018 11:18:25 +0000 Subject: [gpfsug-discuss] Committee change In-Reply-To: References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: I?d also like to wish Claire all the best as well. As a sponsor for a large number of the events, she has been so organized and easy to work with ?and arranged some great after events? so thank you! Tongue firmly in cheek, I?d also like to agree with Claire on the 1 key thing she has learnt and point her towards the Chair of Spectrum-Scale UG for his solution ? All the best Claire! Colin Colin Bridger Mellanox Technologies Mobile: +44 7917 017737 Email: colinb at mellanox.com From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Spectrum Scale User Group Chair) Sent: Friday, June 8, 2018 11:43 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Committee change On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: > on behalf of "secretary at gpfsug.org" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list > Cc: "secretary at spectrumscaleug.org" >, Chair > Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:46:26 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:46:26 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 11 11:49:46 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 11 Jun 2018 10:49:46 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:59:11 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:59:11 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jun 11 12:52:25 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 11 Jun 2018 07:52:25 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jun 11 12:56:43 2018 From: Philipp.Rehs at uni-duesseldorf.de (Philipp Helo Rehs) Date: Mon, 11 Jun 2018 13:56:43 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector Message-ID: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further?? Kind regards ?Philipp Rehs -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1786 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jun 11 13:17:04 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 12:17:04 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Fred. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 11 June 2018 12:52 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Mon Jun 11 13:46:10 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Mon, 11 Jun 2018 14:46:10 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector In-Reply-To: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> References: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Message-ID: Hello, there could be several reasons why data is not shown in the GUI. There are some knobs in the performance data collection that could prevent it. Some common things to check: 1 Are you getting data at all? Some nodes missing? Check with the CLI and expect data: mmperfmon query compareNodes cpu_user -b 3600 -n 2 Legend: ?1:???? cache-11.novalocal|CPU|cpu_user ?2:???? cache-12.novalocal|CPU|cpu_user ?3:???? cache-13.novalocal|CPU|cpu_user Row?????????? Timestamp cache-11 cache-12 cache-13 ? 1 2018-06-11-14:00:00 1.260611 9.447619 4.134019 ? 2 2018-06-11-15:00:00 1.306165 9.026577 4.062405 2. Are specific nodes missing? Check communications between sensors and collectors. 3. Is specific data missing? For Capacity date see here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guicapacityinfoissue.htm 4. How does the sensor config look like? Call mmperfmon config show Can all sensors talk to the collector registered as colCandidates? colCandidates = "cache-11.novalocal" colRedundancy = 1 You can also contact me by PN. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Philipp Helo Rehs To: gpfsug-discuss at spectrumscale.org Date: 11.06.2018 14:05 Subject: [gpfsug-discuss] GPFS-GUI and Collector Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further? Kind regards ?Philipp Rehs (See attached file: pEpkey.asc) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19393134.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/octet-stream Size: 1817 bytes Desc: not available URL: From ulmer at ulmer.org Mon Jun 11 13:47:58 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 11 Jun 2018 08:47:58 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: <1EF433B1-14DD-48AB-B1B4-07EF88E48EDF@ulmer.org> So is it better to pin with the subscription manager, or in our case to pin the kernel version with yum (because you always have something to do when the kernel changes)? What is the consensus? -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Jun 11, 2018, at 6:59 AM, Sobey, Richard A wrote: > > Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: > > [root@ ~]# subscription-manager release > Release: 7.4 > [root@ ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. > > Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! > > Cheers > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) > Sent: 11 June 2018 11:50 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > We have on our DSS-G ? > > Have you looked at: > https://access.redhat.com/solutions/238533 > > ? > > Simon > > From: on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? > > Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 11 14:52:16 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 11 Jun 2018 09:52:16 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From JRLang at uwyo.edu Mon Jun 11 16:01:48 2018 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Mon, 11 Jun 2018 15:01:48 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jun 12 11:42:32 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 12 Jun 2018 10:42:32 +0000 Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel@qmul.ac.uk> We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London From truongv at us.ibm.com Tue Jun 12 14:53:15 2018 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 12 Jun 2018 09:53:15 -0400 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: References: Message-ID: Yes, older versions of GPFS don't recognize /dev/nvme*. So you would need /var/mmfs/etc/nsddevices user exit. On newer GPFS versions, the nvme devices are also generic. So, it is good that you are using the same NSD sub-type. Cheers, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 06/12/2018 06:47 AM Subject: gpfsug-discuss Digest, Vol 77, Issue 15 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: RHEL updated to 7.5 instead of 7.4 (Felipe Knop) 2. Re: RHEL updated to 7.5 instead of 7.4 (Jeffrey R. Lang) 3. Lroc on NVME (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Mon, 11 Jun 2018 09:52:16 -0400 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="utf-8" Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.gif > ------------------------------ Message: 2 Date: Mon, 11 Jun 2018 15:01:48 +0000 From: "Jeffrey R. Lang" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="us-ascii" Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/f085e78e/attachment-0001.html > ------------------------------ Message: 3 Date: Tue, 12 Jun 2018 10:42:32 +0000 From: Peter Childs To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 15 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Tue Jun 12 23:25:53 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 12 Jun 2018 22:25:53 +0000 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Jun 13 10:10:28 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 11:10:28 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Hello, did anyone encountered an error with RHEL 7.5 kernel 3.10.0-862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? I'm getting random errors: Unknown error 521. It means EBADHANDLE. Not sure whether it is due to kernel or GPFS. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:32:44 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:32:44 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Message-ID: <1528882364.26036.3.camel@strath.ac.uk> On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From r.sobey at imperial.ac.uk Wed Jun 13 10:33:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jun 2018 09:33:49 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet however. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:37:56 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:37:56 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: <1528882676.26036.4.camel@strath.ac.uk> On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From TOMP at il.ibm.com Wed Jun 13 10:48:14 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 12:48:14 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882676.26036.4.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:07:52 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 13 Jun 2018 06:07:52 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson > On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jun 13 11:11:26 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 12:11:26 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From TOMP at il.ibm.com Wed Jun 13 11:32:28 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 13:32:28 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: Hi, :-) I explicitly used the term "same environment". The simple answer would be NO, but: While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: david_johnson at brown.edu To: gpfsug main discussion list Date: 13/06/2018 13:13 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:38:37 2018 From: david_johnson at brown.edu (David D Johnson) Date: Wed, 13 Jun 2018 06:38:37 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: So first, apologies for hijacking the thread, but this is a hot issue as we are planning 4.2.x to 5.x.y upgrade in the unspecified future, and are currently running CNFS and clustered CIFS. Those exporter nodes are in need of replacement, and I am unsure of the future status of CNFS and CIFS (are they even in 5.x?). Is there a way to roll out protocols while still offering CNFS/Clustered CIFS, and cut over when it's ready for prime time? > On Jun 13, 2018, at 6:32 AM, Tomer Perry wrote: > > Hi, > > :-) I explicitly used the term "same environment". > > The simple answer would be NO, but: > While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. > That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: david_johnson at brown.edu > To: gpfsug main discussion list > Date: 13/06/2018 13:13 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? > > -- ddj > Dave Johnson > > On Jun 13, 2018, at 5:48 AM, Tomer Perry > wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > > To: gpfsug main discussion list > > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Jun 13 15:45:44 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 17:45:44 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> Message-ID: Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed Jun 13 16:14:53 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 13 Jun 2018 15:14:53 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pprandive at rediffmail.com Thu Jun 14 15:22:09 2018 From: pprandive at rediffmail.com (Prafulla) Date: 14 Jun 2018 14:22:09 -0000 Subject: [gpfsug-discuss] =?utf-8?q?GPFS_support_for_latest_stable_release?= =?utf-8?q?_of_OpenStack_=28called_Queens_https=3A//www=2Eopenstack?= =?utf-8?q?=2Eorg/software/queens/=29?= Message-ID: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens?I have few queries around that,1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)?2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose?Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance!Regards,pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jun 14 15:56:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 14 Jun 2018 14:56:28 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: <7282ECEB-75F0-45AF-A36C-57D3B5930CBA@bham.ac.uk> That probably depends on your definition of support? Object as part of the CES stack is currently on Pike (AFAIK). If you wanted to run swift and Queens then I don?t think that would be supported as part of CES. I believe that cinder/manilla/glance integration is written by IBM developers, but I?m not sure if there was ever a formal support statement from IBM about this, (in the sense of a guaranteed support with a PMR). Simon From: on behalf of "pprandive at rediffmail.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 14 June 2018 at 15:49 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Thu Jun 14 16:04:00 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Thu, 14 Jun 2018 11:04:00 -0400 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: Brian is probably best able to answer this question. Lyle From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.b.mills at nasa.gov Thu Jun 14 16:09:57 2018 From: jonathan.b.mills at nasa.gov (Mills, Jonathan B. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 14 Jun 2018 15:09:57 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: I can?t speak for the GUI integration with Horizon, but I use GPFS 4.2.3.8 just fine with OpenStack Pike (for Glance, Cinder, and Nova). I?d be surprised if it worked any differently in Queens. From: on behalf of Lyle Gayne Reply-To: gpfsug main discussion list Date: Thursday, June 14, 2018 at 11:05 AM To: gpfsug main discussion list Cc: Brian Nelson Subject: Re: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Brian is probably best able to answer this question. Lyle [Inactive hide details for "Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the l]"Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From brnelson at us.ibm.com Fri Jun 15 04:36:19 2018 From: brnelson at us.ibm.com (Brian Nelson) Date: Thu, 14 Jun 2018 22:36:19 -0500 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: The only OpenStack component that GPFS explicitly ships is Swift, which is used for the Object protocol of the Protocols Support capability. The latest version included is Swift at the Pike release. That was first made available in the GPFS 5.0.1.0 release. The other way that GPFS can be used is as the backing store for many OpenStack components, as you can see in this table: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1ins_openstackusecase.htm The GPFS drivers for those components were written in the Liberty/Mitaka timeframe. We generally do not certify every OpenStack release against GPFS. However, we have not had any compatibility issues with later releases, and I would expect Queens to also work fine with GPFS storage. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Fri Jun 15 13:01:07 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 14:01:07 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Jun 15 15:49:14 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 15 Jun 2018 14:49:14 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Fri Jun 15 16:16:18 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 17:16:18 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Fri Jun 15 16:23:43 2018 From: chrisjscott at gmail.com (Chris Scott) Date: Fri, 15 Jun 2018 16:23:43 +0100 Subject: [gpfsug-discuss] Employment vacancy: Research Computing Specialist at University of Dundee, Scotland Message-ID: Hi All This is an employment opportunity to work with Spectrum Scale and its integration features with Spectrum Protect. Please see or forward along the following link to an employment vacancy in my team for a Research Computing Specialist here at the University of Dundee: https://www.jobs.dundee.ac.uk/fe/tpl_uod01.asp?s=4A515F4E5A565B1A&jobid=102157,4132345688&key=135360005&c=54715623342377&pagestamp=sepirmfpbecljxwhkl Cheers Chris [image: University of Dundee shield logo] *Chris Scott* Research Computing Manager School of Life Sciences, UoD IT, University of Dundee +44 (0)1382 386250 | C.Y.Scott at dundee.ac.uk [image: University of Dundee Facebook] [image: University of Dundee Twitter] [image: University of Dundee LinkedIn] [image: University of Dundee YouTube] [image: University of Dundee Instagram] [image: University of Dundee Snapchat] *One of the world's top 200 universities* Times Higher Education World University Rankings 2018 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Jun 15 16:25:50 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 15 Jun 2018 11:25:50 -0400 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Message-ID: Assuming CentOS 7.5 parallels RHEL 7.5 then you would need Spectrum Scale 4.2.3.9 because that is the release version (along with 5.0.1 PTF1) that supports RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Iban Cabrillo To: gpfsug-discuss Date: 06/15/2018 11:16 AM Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jun 15 17:17:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 15 Jun 2018 16:17:48 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <4D6C04F4-266A-47AC-BC9A-C0CA9AA2B123@bham.ac.uk> This: ?2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized? Looks like you are telling GPFS to use an MLX card that doesn?t exist on the node, this is set with verbsPorts, it?s probably not your issue here, but you are better using nodeclasses and assigning the config option to those nodeclasses that have the correct card installed (I?d also encourage you to use a fabric number, we do this even if there is only 1 fabric currently in the cluster as we?ve added other fabrics over time or over multiple locations). Have you tried using mmnetverify at all? It?s been getting better in the newer releases and will give you a good indication if you have a comms issue due to something like name resolution in addition to testing between nodes? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 15 June 2018 at 16:16 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Jun 18 11:43:38 2018 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 18 Jun 2018 12:43:38 +0200 Subject: [gpfsug-discuss] Fw: User Group Meeting at ISC2018 Frankfurt - Agenda Update Message-ID: Greetings: Here is the refined agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. If not yet done - please register here to attend so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES Looking forward to see you there, Ulf Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 What is new in ESS? Christopher Maestas (IBM) 15:00-15:15 High Capacity File Storage Oliver Kill (pro-com) 15:15-15:35 Site Report: CSCS Stefano Gorini (CSCS) 15:35-15:55 Site Report: University of Birmingham Simon Thompson (University of Birmingham) 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Deep Dive on one Spectrum Scale Feature Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 18.06.2018 12:33 ----- From: Ulf Troppens/Germany/IBM To: gpfsug-discuss at spectrumscale.org Date: 28.05.2018 09:59 Subject: User Group Meeting at ISC2018 Frankfurt Greetings: IBM is happy to announce the agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. We will finish on time to attend the opening reception. As with other user group meetings, the agenda includes user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. Please join us! To attend please register here so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES We are still looking for two customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 News from Lenovo Storage Michael Hennicke (Lenovo) 15:00-15:15 What is new in ESS? Christopher Maestas (IBM) 15:15-15:35 Customer talk 1 TBD 15:35-15:55 Customer talk 2 TBD 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Field Update Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) Looking forward to see some of you there. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From PPOD at de.ibm.com Mon Jun 18 14:59:16 2018 From: PPOD at de.ibm.com (Przemyslaw Podfigurny1) Date: Mon, 18 Jun 2018 13:59:16 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043380.png Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043381.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043382.png Type: image/png Size: 1167 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 16:53:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 15:53:51 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Mon Jun 18 17:05:35 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Mon, 18 Jun 2018 16:05:35 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment * Red Hat Enterprise Linux 5 * Red Hat Enterprise Linux 6 * Red Hat Enterprise Linux 7 * NFSv4 share being exported from an NFSv4 capable NFS server Issue * From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. * Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: * How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution * Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com * Note: * If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. * If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. * On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. * Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. * On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. * Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping * If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. * With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ * Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause * NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit * The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR|NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data->auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } * This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C * For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps * Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" * The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' * Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & * If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). * From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From chetkulk at in.ibm.com Mon Jun 18 17:20:29 2018 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Mon, 18 Jun 2018 21:50:29 +0530 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com client: e.g. RHEL NFS client; set Domain attribute in /etc/idmapd.conf file and restart idmap service. # egrep ^Domain /etc/idmapd.conf Domain = test.com # service nfs-idmap restart reference Link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/b1ladm_authconsidfornfsv4access.htm Thanks, Chetan. From: "Wilson, Neil" To: gpfsug main discussion list Date: 06/18/2018 09:35 PM Subject: Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Sent by: gpfsug-discuss-bounces at spectrumscale.org I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterprise Linux 7 NFSv4 share being exported from an NFSv4 capable NFS server Issue From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com Note: If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2| caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR| NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data-> auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$ (hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 17:56:55 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 16:56:55 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: <8B8EB415-1221-454B-A08C-5B029C4F8BF8@nuance.com> That was it, thanks! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Chetan R Kulkarni Reply-To: gpfsug main discussion list Date: Monday, June 18, 2018 at 11:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jun 18 23:21:30 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 18 Jun 2018 15:21:30 -0700 Subject: [gpfsug-discuss] Save the Date September 19-20 2018 GPFS/SS Users Group Meeting at ORNL Message-ID: <5670B56F-AF19-4A90-8BDF-24B865231EC1@lbl.gov> Hello all, There is an event being planned for the week of September 16, 2018 at Oak Ridge National Laboratory (ORNL). This GPFS/Spectrum Scale UG meeting will be in conjunction with the HPCXXL User Group. We have done events like this in the past, typically in NYC, however, with the announcement of Summit (https://www.ornl.gov/news/ornl-launches-summit-supercomputer ) and it?s 250 PB, 2.5 TB/s GPFS installaion it is an exciting time to have ORNL as the venue. Per usual, the GPFS day will be free, however, this time the event will be split across two days, Wednesday (19th) afternoon and Thursday (20th) morning This way, if you want to travel out Wednesday morning and back Thursday afternoon it?s very do-able. If you want to stay around Thursday afternoon there will be a data center tour available. There will be some additional approval processes to attend at ORNL and we?ll share those details and more in the coming weeks. If you are interested in presenting something your site is working on, please let us know. User talks are always well received. Save a space on your calendar and hope to see you there. Best, Kristy PS - We will, as usual, also have an event at SC18, more on that soon as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 15:08:09 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 14:08:09 +0000 Subject: [gpfsug-discuss] mmbackup issue Message-ID: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Hallo All, we are working since two weeks(or more) on a PMR that mmbackup has problems with the MC Class in TSM. The result is that we have defined a version exist of 5. But with each run, the policy engine generate a expire list (where the mentioned files already selected) and at the end we see only (in every case) 2 Backup versions of a file. We are at: GPFS 5.0.1.1 TSM-Server 8.1.1.0 TSM-Client 7.1.6.2 After some testing we found the reason: Our mmbackup Test is performed with vi , to change a files content and restart the next mmbackup testcycle. The Problem that we found here with the defaults in vi (set backupcopy=no, attention if no a backupcopy are generatetd) There are after each test (change of the content) the file became every time a new inode number. This behavior is the reason why the shadowfile think(or the policyengine) the old file is never existent And generate an delete request in the expire policy files for dsmc (correct me if I wrong here) . Ok vi is not the problem but we had also Applications that had the same dataset handling (as ex. SAS) At SAS the data file will updated with a xx.data.new file and after the close the xx.data.new will be renamed to the original name xx.data again. And the miss interpretation of different inodes happen again. The question now are there code in the mmbackup or in gpfs for the shadow file to check or ignore the inode change for the same file. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.holliday at crick.ac.uk Wed Jun 20 15:19:13 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 20 Jun 2018 14:19:13 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 20 15:45:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 20 Jun 2018 10:45:23 -0400 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Message-ID: <9471.1529505923@turing-police.cc.vt.edu> On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From anobre at br.ibm.com Wed Jun 20 16:11:03 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Wed, 20 Jun 2018 15:11:03 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jun 20 15:52:09 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 20 Jun 2018 10:52:09 -0400 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: <638e2070-2e99-e6dc-b843-1fd368c21bc0@nasa.gov> We've used the Windows client here @ NASA (I think we have in the neighborhood of between 15 and 20 clients). I'm guessing when you say GPFS shows no errors you've dumped waiters and grabbed dump tscomm output and that's clean? -Aaron On 6/20/18 10:19 AM, Michael Holliday wrote: > Hi All, > > We?ve being trying to get the windows system to mount GPFS.? We?ve set > the drive letter on the files system, and we can get the system added to > the GPFS cluster and showing as active. > > When we try to mount the file system ?the system just sits and does > nothing ? GPFS shows no errors or issues, there are no problems in the > log files. The firewalls are stopped and as far as we can tell it should > work. > > Does anyone have any experience with the GPFS windows client that may > help us? > > Michael > > Michael Holliday RITTech MBCS > > Senior HPC & Research Data Systems Engineer | eMedLab Operations Team > > Scientific Computing | IT&S | The Francis Crick Institute > > 1, Midland Road| London | NW1 1AT| United Kingdom > > Tel: 0203 796 3167 > > The Francis Crick Institute Limited is a registered charity in England > and Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From YARD at il.ibm.com Wed Jun 20 16:30:37 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:30:37 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Wed Jun 20 16:35:57 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:35:57 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 17:00:03 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 16:00:03 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <9471.1529505923@turing-police.cc.vt.edu> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> Message-ID: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Hallo Valdis, first thanks for the explanation we understand that, but this problem generate only 2 Version at tsm server for the same file, in the same directory. This mean that mmbackup and the .shadow... has no possibility to have for the same file in the same directory more then 2 backup versions with tsm. The native ba-client manage this. (Here are there already different inode numbers existent.) But at TSM-Server side the file that are selected at 'ba incr' are merged to the right filespace and will be binded to the mcclass >2 Version exist. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von valdis.kletnieks at vt.edu Gesendet: Mittwoch, 20. Juni 2018 16:45 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmbackup issue On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... From olaf.weiser at de.ibm.com Wed Jun 20 17:06:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Jun 2018 18:06:56 +0200 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de><9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 21 08:32:39 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 21 Jun 2018 08:32:39 +0100 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Thu Jun 21 10:18:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 21 Jun 2018 09:18:29 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: <41b590c74c314bf38111c8cc17fde764@SMXRF105.msg.hukrf.de> Hallo JAB, the main problem here is that the inode is changeing for the same file in the same directory. The mmbackup generate and execute at first the expirelist from the same file with the old inode number and afterward the selective backup for the same file with the new inode number. We want to test now to increase the version deleted Parameter here. In contrast to the ba incr in a local fs TSM make these steps in one and handle these issue. My hope now the mmbackup people can enhance these to generate a comparison list is the Filename in the selection list and the filename already in expirelist and check these first, skip these file from expire list, before the expire list will be executed. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Isom.Crawford at ibm.com Thu Jun 21 15:48:02 2018 From: Isom.Crawford at ibm.com (Isom Crawford) Date: Thu, 21 Jun 2018 09:48:02 -0500 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi Michael, It's been a while, but I've run into similar issues with Scale on Windows. One possible issue is the GPFS administrative account configuration using the following steps: ---- 1. Create a domain user with the logon name root. 2. Add user root to the Domain Admins group or to the local Administrators group on each Windows node. 3. In root Properties/Profile/Home/LocalPath, define a HOME directory such as C:\Users\root\home that does not include spaces in the path name and is not the same as the profile path. 4. Give root the right to log on as a service as described in ?Allowing the GPFS administrative account to run as a service.? Step 3 including consistent use of the HOME directory you define, is required for the Cygwin environment ---- I have botched step 3 before with the result being very similar to your experience. Carefule re-configuration of the cygwin root *home* directory fixed some of the problems. Hope this helps. Another tangle you may run into is disabling IPv6. I had to completely disable IPv6 on the Windows client by not only deselecting it on the network interface properties list, but also disabling it system-wide. The symptoms vary, but utilities like mmaddnode or mmchnode may fail due to invalid interface. Check the output of /usr/lpp/mmfs/bin/mmcmi host to be sure it's the host that Scale expects. (In my case, it returned ::1 until I completely disabled IPv6). My notes follow: This KB article tells us about a setting that affects what Windows prefers, emphasized in bold: In Registry Editor, locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6 \Parameters Double-click DisabledComponents to modify the DisabledComponents entry. Note: If the DisabledComponents entry is unavailable, you must create it. To do this, follow these steps: In the Edit menu, point to New, and then click DWORD (32-bit) Value. Type DisabledComponents, and then press ENTER. Double-click DisabledComponents. Type any one of the following values in the Value data: field to configure the IPv6 protocol to the desired state, and then click OK: Type 0 to enable all IPv6 components. (Windows default setting) Type 0xffffffff to disable all IPv6 components, except the IPv6 loopback interface. This value also configures Windows to prefer using Internet Protocol version 4 (IPv4) over IPv6 by modifying entries in the prefix policy table. For more information, see Source and Destination Address Selection. Type 0x20 to prefer IPv4 over IPv6 by modifying entries in the prefix policy table. Type 0x10 to disable IPv6 on all nontunnel interfaces (on both LAN and Point-to-Point Protocol [PPP] interfaces). Type 0x01 to disable IPv6 on all tunnel interfaces. These include Intra-Site Automatic Tunnel Addressing Protocol (ISATAP), 6to4, and Teredo. Type 0x11 to disable all IPv6 interfaces except for the IPv6 loopback interface. Restart the computer for this setting to take effect. Kind Regards, Isom L. Crawford Jr., PhD. NA SDI SME Team Software Defined Infrastructure 2700 Redwood Street Royse City, TX 75189 United States Phone: 214-707-4611 E-mail: isom.crawford at ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Jun 21 22:42:30 2018 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Thu, 21 Jun 2018 21:42:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password Message-ID: I have a test cluster I setup months ago and then did nothing with. Now I need it again but for the life of me I can't remember the admin password to the GUI. Is there an easy way to reset it under the covers? I would hate to uninstall everything and start over. I can certainly admin everything from the cli but I use it to show others some things from time to time and it doesn't make sense to do that always from the command line. Thoughts? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Fri Jun 22 03:26:55 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Fri, 22 Jun 2018 02:26:55 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:13:43 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:13:43 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jun 22 14:19:18 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 22 Jun 2018 13:19:18 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: Message-ID: <5C6312EE-A958-4CBF-9AAC-F342CE87DB70@vanderbilt.edu> Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin On Jun 22, 2018, at 8:13 AM, Oesterlin, Robert > wrote: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C46935624ea7048a9471608d5d841feb5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636652700325626997&sdata=Az9GZeDDG76lDLi02NSKYXsXK9EHy%2FT3vLAtaMrnpew%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:28:02 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:28:02 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Fri Jun 22 15:10:29 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Fri, 22 Jun 2018 10:10:29 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Yep. And nrg1-gpfs13 isn?t even a manager node anymore! > > > > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned > as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) > appointed as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) > completed take over for dataeng. > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, June 22, 2018 at 8:21 AM > *To: *gpfsug main discussion list > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] File system manager - won't > change to new node > > > > Hi Bob, > > > > Have you tried explicitly moving it to a specific manager node? That?s > what I always do ? I personally never let GPFS pick when I?m moving the > management functions for some reason. Thanks? > > > > Kevin > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jun 22 15:38:05 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 22 Jun 2018 14:38:05 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <78d4f2d963134e87af9b123891da2c47@jumptrading.com> Hi Bob, Also tracing waiters on the cluster can help you understand if there is something that is blocking this kind of operation. Beyond the command output, which is usually too terse to understand what is actually happening, do the logs on the nodes in the cluster give you any further details about the operation? Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jordan Robertson Sent: Friday, June 22, 2018 9:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] File system manager - won't change to new node Note: External Email ________________________________ Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert > wrote: Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Fri Jun 22 20:03:52 2018 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 22 Jun 2018 14:03:52 -0500 Subject: [gpfsug-discuss] mmfsadddisk command interrupted Message-ID: We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jun 22 23:11:45 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 22 Jun 2018 18:11:45 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <128279.1529705505@turing-police.cc.vt.edu> On Fri, 22 Jun 2018 13:28:02 -0000, "Oesterlin, Robert" said: > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. That's an.... "interesting".. definition of "successful".... :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 25 16:56:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 25 Jun 2018 15:56:31 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: Hallo All, here the requirement for enhancement of mmbackup. http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=121687 Please vote. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon Jun 25 20:43:49 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 25 Jun 2018 15:43:49 -0400 Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... In-Reply-To: References: <20180418115445.8603670sy6ee6fk5@support.scinet.utoronto.ca> Message-ID: <20180625154349.47520gasb6cvevhx@support.scinet.utoronto.ca> It took a while before I could get back to this issue, but I want to confirm that Marc's suggestions worked line a charm, and did exactly what I hoped for: * remove any FOR FILESET(...) specifications * mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... I didn't have to do anything else, but exclude a few filesets from the scan. Thanks Jaime Quoting "Marc A Kaplan" : > I suggest you remove any FOR FILESET(...) specifications from your rules > and then run > > mmapplypolicy > /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan > ... --scope inodespace -P your-policy-rules-file ... > > See also the (RTFineM) for the --scope option and the Directory argument > of the mmapplypolicy command. > > That is the best, most efficient way to scan all the files that are in a > particular inode-space. Also, you must have all filesets of interest > "linked" and the file system must be mounted. > > Notice that "independent" means that the fileset name is used to denote > both a fileset and an inode-space, where said inode-space contains the > fileset of that name and possibly other "dependent" filesets... > > IF one wished to search the entire file system for files within several > different filesets, one could use rules with > > FOR FILESET('fileset1','fileset2','and-so-on') > > Or even more flexibly > > WHERE FILESET_NAME LIKE 'sql-like-pattern-with-%s-and-maybe-_s' > > Or even more powerfully > > WHERE regex(FILESET_NAME, 'extended-regular-.*-expression') > > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 04/18/2018 01:00 PM > Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > A few months ago I asked about limits and dynamics of traversing > depended .vs independent filesets on this forum. I used the > information provided to make decisions and setup our new DSS based > gpfs storage system. Now I have a problem I couldn't' yet figure out > how to make it work: > > 'project' and 'scratch' are top *independent* filesets of the same > file system. > > 'proj1', 'proj2' are dependent filesets nested under 'project' > 'scra1', 'scra2' are dependent filesets nested under 'scratch' > > I would like to run a purging policy on all contents under 'scratch' > (which includes 'scra1', 'scra2'), and TSM backup policies on all > contents under 'project' (which includes 'proj1', 'proj2'). > > HOWEVER: > When I run the purging policy on the whole gpfs device (with both > 'project' and 'scratch' filesets) > > * if I use FOR FILESET('scratch') on the list rules, the 'scra1' and > 'scra2' filesets under scratch are excluded (totally unexpected) > > * if I use FOR FILESET('scra1') I get error that scra1 is dependent > fileset (Ok, that is expected) > > * if I use /*FOR FILESET('scratch')*/, all contents under 'project', > 'proj1', 'proj2' are traversed as well, and I don't want that (it > takes too much time) > > * if I use /*FOR FILESET('scratch')*/, and instead of the whole device > I apply the policy to the /scratch mount point only, the policy still > traverses all the content of 'project', 'proj1', 'proj2', which I > don't want. (again, totally unexpected) > > QUESTION: > > How can I craft the syntax of the mmapplypolicy in combination with > the RULE filters, so that I can traverse all the contents under the > 'scratch' independent fileset, including the nested dependent filesets > 'scra1','scra2', and NOT traverse the other independent filesets at > all (since this takes too much time)? > > Thanks > Jaime > > > PS: FOR FILESET('scra*') does not work. > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=IpwHlr0YNr7rgV7gI8Y2sxIELLIwA15KK4nBnv9BYWk&e= > > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=aff0vMJkKd-Z3pw3-jckmI3ejqXh8aSr8rxkKf3OGdk&e= > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From erich at uw.edu Tue Jun 26 00:20:35 2018 From: erich at uw.edu (Eric Horst) Date: Mon, 25 Jun 2018 16:20:35 -0700 Subject: [gpfsug-discuss] mmchconfig subnets Message-ID: Hi, I'm hoping somebody has insights into how the subnets option actually works. I've read the docs a dozen times and I want to make sure I understand before I take my production cluster down to make the changes. On the current cluster the daemon addresses are on a gpfs private network and the admin addresses are on a public network. I'm changing so both daemon and admin are public and the subnets option is used to utilize the private network. This is to facilitate remote mounts to an independent cluster. The confusing factor in my case, not covered in the docs, is that the gpfs private network is subnetted and static routes are used to reach them. That is, there are three private networks, one for each datacenter and the cluster nodes daemon interfaces are spread between the three. 172.16.141.32/27 172.16.141.24/29 172.16.141.128/27 A router connects these three networks but are otherwise 100% private. For my mmchconfig subnets command should I use this? mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" Where I get confused is that I'm trying to reason through how Spectrum Scale is utilizing the subnets setting to decide if this will have the desired result on my cluster. If I change the node addresses to their public addresses, ie the private addresses are not explicitly configured in Scale, then how are the private addresses discovered? Does each node use the subnets option to identify that it has a private address and then dynamically shares that with the cluster? Thanks in advance for your clarifying comments. -Eric -- Eric Horst University of Washington -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 01:58:53 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Mon, 25 Jun 2018 18:58:53 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 Message-ID: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem?? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR From knop at us.ibm.com Tue Jun 26 04:36:43 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 25 Jun 2018 23:36:43 -0400 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joey, The subblocks-per-full-block value cannot be specified when the file system is created, but is rather computed automatically by GPFS. In file systems with format older than 5.0, the value is fixed at 32. For file systems with format 5.0.0 or later, the value is computed based on the block size. See manpage for mmcrfs, in table where the -B BlockSize option is explained. (Table 1. Block sizes and subblock sizes) . Say, for the default (in 5.0+) 4MB block size, the subblock size is 8KB. The minimum "practical" subblock size is 4KB, to keep 4KB-alignment to accommodate 4KN devices. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Joseph Mendoza To: gpfsug main discussion list Date: 06/25/2018 08:59 PM Subject: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Tue Jun 26 07:21:26 2018 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 26 Jun 2018 08:21:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 16:18:01 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Tue, 26 Jun 2018 09:18:01 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block > size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb.? > is this setup for a traditional NSD Setup or for GNR as the recommendations would be different.? > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza > wrote: > > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem?? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small).? This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0?? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag??????????????? value??????????????????? description > ------------------- ------------------------ > ----------------------------------- > ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) > size in bytes (system pool) > ??????????????????? 131072?????????????????? Minimum fragment (subblock) > size in bytes (other pools) > ?-i???????????????? 4096???????????????????? Inode size in bytes > ?-I???????????????? 32768??????????????????? Indirect block size in bytes > > ?-B???????????????? 524288?????????????????? Block size (system pool) > ??????????????????? 8388608????????????????? Block size (other pools) > > ?-V???????????????? 19.01 (5.0.1.0)????????? File system version > > ?--subblocks-per-full-block 64?????????????? Number of subblocks per > full block > ?-P???????????????? system;DATA????????????? Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Jun 26 16:32:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 26 Jun 2018 15:32:55 +0000 Subject: [gpfsug-discuss] mmchconfig subnets In-Reply-To: References: Message-ID: <20180626153255.d4sftfljwusa6yrg@utumno.gs.washington.edu> My understanding is that GPFS uses the network configuration on each node to determine netmask. The subnets option can be applied to specific nodes or groups of nodes with "mmchconfig subnets=... -N ", so what you're doing is specificy the preferred subnets for GPFS node communication, just for that list of nodes. For instance, we have four GPFS clusters, with three subnets: * eichler-cluster, eichler-cluster2 (10.130.0.0/16) * grc-cluster (10.200.0.0/16) * gs-cluster (10.110.0.0/16) And one data transfer system weasel that is a member of gs-cluster, but provides transfer services to all the clusters, and has an IP address on each subnet to avoid a bunch of network cross-talk. Its subnets setting looks like this: [weasel] subnets 10.130.0.0/eichler-cluster*.grid.gs.washington.edu 10.200.0.0/grc-cluster.grid.gs.washington.edu 10.110.0.0/gs-cluster.grid.gs.washington.edu Of course, there's some policy routing too to keep replies on the right interface as well, but that's the extent of the GPFS configuration. On Mon, Jun 25, 2018 at 04:20:35PM -0700, Eric Horst wrote: > Hi, I'm hoping somebody has insights into how the subnets option actually > works. I've read the docs a dozen times and I want to make sure I > understand before I take my production cluster down to make the changes. > > On the current cluster the daemon addresses are on a gpfs private network > and the admin addresses are on a public network. I'm changing so both > daemon and admin are public and the subnets option is used to utilize the > private network. This is to facilitate remote mounts to an independent > cluster. > > The confusing factor in my case, not covered in the docs, is that the gpfs > private network is subnetted and static routes are used to reach them. That > is, there are three private networks, one for each datacenter and the > cluster nodes daemon interfaces are spread between the three. > > 172.16.141.32/27 > 172.16.141.24/29 > 172.16.141.128/27 > > A router connects these three networks but are otherwise 100% private. > > For my mmchconfig subnets command should I use this? > > mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" > > Where I get confused is that I'm trying to reason through how Spectrum > Scale is utilizing the subnets setting to decide if this will have the > desired result on my cluster. If I change the node addresses to their > public addresses, ie the private addresses are not explicitly configured in > Scale, then how are the private addresses discovered? Does each node use > the subnets option to identify that it has a private address and then > dynamically shares that with the cluster? > > Thanks in advance for your clarifying comments. > > -Eric > > -- > > Eric Horst > University of Washington > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From r.sobey at imperial.ac.uk Wed Jun 27 11:47:02 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 10:47:02 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed Message-ID: Hi all, I'm getting the following error in the GUI, running 5.0.1: "The following GUI refresh task(s) failed: PM_MONITOR". As yet, this is the only node I've upgraded to 5.0.1 - the rest are running (healthily, according to the GUI) 4.2.3.7. I'm not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I've completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 27 12:29:19 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 27 Jun 2018 11:29:19 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: Message-ID: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingo.altenburger at id.ethz.ch Wed Jun 27 12:45:29 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Wed, 27 Jun 2018 11:45:29 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Message-ID: Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 13:28:50 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 12:28:50 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> References: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Wed Jun 27 13:49:38 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Wed, 27 Jun 2018 12:49:38 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 14:14:59 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 13:14:59 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Andreas, Output of the debug log ? no clue, but maybe you can interpret it better ? [root at icgpfsq1 ~]# /usr/lpp/mmfs/gui/cli/runtask pm_monitor --debug debug: locale=en_US debug: Raising event: gui_pmcollector_connection_ok, for node: localhost.localdomain err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more debug: Will not raise the following event using 'mmsysmonc' since it already exists in the database: reportingNode = 'icgpfsq1', eventName = 'gui_refresh_task_failed', entityId = '11', arguments = 'PM_MONITOR', identifier = 'null' err: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was unsuccessful. at com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:69) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) EFSSG1150C Running specified task was unsuccessful. Thanks Richard From: Andreas Koeninger [mailto:andreas.koeninger at de.ibm.com] Sent: 27 June 2018 13:50 To: Sobey, Richard A Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hi Richard, if you double-click the event there should be some additional help available. The steps under "User Action" will hopefully help to identify the root cause: 1.) Check if there is additional information available by executing '/usr/lpp/mmfs/gui/cli/lstasklog [taskname]'. 2.) Run the specified task manually on the CLI by executing '/usr/lpp/mmfs/gui/cli/runtask [taskname] --debug'. ... Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Date: Wed, Jun 27, 2018 2:29 PM Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jun 27 18:53:39 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 27 Jun 2018 17:53:39 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Jun 27 19:09:40 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:09:40 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb From S.J.Thompson at bham.ac.uk Wed Jun 27 19:33:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 27 Jun 2018 18:33:28 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] Sent: 27 June 2018 19:09 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cabrillo at ifca.unican.es Wed Jun 27 20:24:28 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jun 2018 21:24:28 +0200 (CEST) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> An HTML attachment was scrubbed... URL: From renata at SLAC.STANFORD.EDU Wed Jun 27 19:54:47 2018 From: renata at SLAC.STANFORD.EDU (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:54:47 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > >https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From renata at slac.stanford.edu Wed Jun 27 20:30:33 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 12:30:33 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> References: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> Message-ID: Hi, any gpfs commands fail with: root at ocio-gpu01 ~]# mmlsmgr get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsmgr: Command failed. Examine previous error messages to determine cause. The two "working" nodes are arbitrating. Also, they are using ccr, so doesn't that mean the primary/secondary setup for a client cluster doesn't apply? Renata On Wed, 27 Jun 2018, Iban Cabrillo wrote: >Hi,? ? Have you check if there is any manager node available?? >#mmlsmgr > >If not could you try to asig a new cluster/gpfs_fs manager. > >Mmchmgr? ? gpfs_fs. Manager_node >Mmchmgr.? ?-c.? Cluster_manager_node > >Cheers.? > > From scale at us.ibm.com Wed Jun 27 22:14:23 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 27 Jun 2018 17:14:23 -0400 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Renata, You may want to reduce the set of quorum nodes. If your version supports the --force option, you can run mmchnode --noquorum -N --force It is a good idea to configure tiebreaker disks in a cluster that has only 2 quorum nodes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Renata Maria Dart To: gpfsug-discuss at spectrumscale.org Date: 06/27/2018 02:21 PM Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kevindjo at us.ibm.com Wed Jun 27 22:20:41 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 27 Jun 2018 21:20:41 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From spectrumscale at kiranghag.com Thu Jun 28 04:14:30 2018 From: spectrumscale at kiranghag.com (KG) Date: Thu, 28 Jun 2018 08:44:30 +0530 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Can you also check the time differences between nodes? We had a situation recently where the server time mismatch caused failures. On Thu, Jun 28, 2018 at 2:50 AM, Kevin D Johnson wrote: > You can also try to convert to the old primary/secondary model to back it > away from the default CCR configuration. > > mmchcluster --ccr-disable -p servername > > Then, temporarily go with only one quorum node and add more once the > cluster comes back up. Once the cluster is back up and has at least two > quorum nodes, do a --ccr-enable with the mmchcluster command. > > Kevin D. Johnson > Spectrum Computing, Senior Managing Consultant > MBA, MAcc, MS Global Technology and Development > IBM Certified Technical Specialist Level 2 Expert > > [image: IBM Certified Technical Specialist Level 2 Expert] > > Certified Deployment Professional - Spectrum Scale > Certified Solution Advisor - Spectrum Computing > Certified Solution Architect - Spectrum Storage Solutions > > > 720.349.6199 - kevindjo at us.ibm.com > > "To think is to achieve." - Thomas J. Watson, Sr. > > > > > ----- Original message ----- > From: "IBM Spectrum Scale" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: renata at slac.stanford.edu, gpfsug main discussion list < > gpfsug-discuss at spectrumscale.org> > Cc: > Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Date: Wed, Jun 27, 2018 5:15 PM > > > Hi Renata, > > You may want to reduce the set of quorum nodes. If your version supports > the --force option, you can run > > mmchnode --noquorum -N --force > > It is a good idea to configure tiebreaker disks in a cluster that has only > 2 quorum nodes. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for Renata Maria Dart ---06/27/2018 02:21:52 > PM---Hi, we have a client cluster of 4 nodes with 3 quorum n]Renata Maria > Dart ---06/27/2018 02:21:52 PM---Hi, we have a client cluster of 4 nodes > with 3 quorum nodes. One of the quorum nodes is no longer i > > From: Renata Maria Dart > To: gpfsug-discuss at spectrumscale.org > Date: 06/27/2018 02:21 PM > Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the > quorum nodes is no longer in service and the other was reinstalled with > a newer OS, both without informing the gpfs admins. Gpfs is still > "working" on the two remaining nodes, that is, they continue to have access > to the gpfs data on the remote clusters. But, I can no longer get > any gpfs commands to work. On one of the 2 nodes that are still serving > data, > > root at ocio-gpu01 ~]# mmlscluster > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlscluster: Command failed. Examine previous error messages to determine > cause. > > > On the reinstalled node, this fails in the same way: > > [root at ocio-gpu02 ccr]# mmstartup > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine > cause. > > > I have looked through the users group interchanges but didn't find anything > that seems to fit this scenario. > > Is there a way to salvage this cluster? Can it be done without > shutting gpfs down on the 2 nodes that continue to work? > > Thanks for any advice, > > Renata Dart > SLAC National Accelerator Lb > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From ingo.altenburger at id.ethz.ch Thu Jun 28 07:37:48 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Thu, 28 Jun 2018 06:37:48 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments In-Reply-To: References: Message-ID: I have to note that we use the from-SONAS-imported snapshot scheduler as part of the gui to create (and keep/delete) the snapshots. When performing mmcrsnapshot @2018-06-27-14-01 -j then this snapshot is visible in MacOS Finder but not in Windows 'previous versions'. Thus, the issue might be related to the way the scheduler is creating snapshots. Since having hundreds of filesets we need snapshots for, doing the scheduling by ourselves is not trivial and a preferred option. Regards Ingo Altenburger From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Altenburger Ingo (ID SD) Sent: Mittwoch, 27. Juni 2018 13:45 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 28 08:44:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jun 2018 09:44:16 +0200 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From alvise.dorigo at psi.ch Thu Jun 28 09:02:07 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 08:02:07 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Message-ID: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jun 28 09:15:46 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 28 Jun 2018 08:15:46 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 28 09:26:41 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 28 Jun 2018 09:26:41 +0100 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: <1530174401.26036.55.camel@strath.ac.uk> On Wed, 2018-06-27 at 17:53 +0000, Christof Schmitt wrote: > Hi, > ? > we currently support the SMB protocol method of quering snapshots, > which is used by the Windows "Previous versions" dialog. Mac clients > unfortunately do not implement these explicit queries. Browsing the > snapshot directories with the @GMT names through SMB currently is not > supported. > ? > Could you open a RFE to request snapshot browsing from Mac clients? > An official request would be helpful in prioritizing the development > and test work required to support this. > ? Surely the lack of previous versions in the Mac Finder is an issue for Apple to fix??As such an RFE with IBM is not going to help and good look getting Apple to lift a finger. Similar for the various Linux file manager programs, though in this case being open source at least IBM could contribute code to fix the issue. However it occurs to me that a solution might be to run the Windows Explorer under Wine on the above platforms. Obviously licensing issues may well make that problematic, but perhaps the ReactOS clone of Windows Explorer supports the "Previous versions" feature, and if not it could be expanded to do so. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From alvise.dorigo at psi.ch Thu Jun 28 10:39:35 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 09:39:35 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526727D32@MBX114.d.ethz.ch> Hi Andrew, thanks for the naswer. No, the port #2 (on all the nodes) is not cabled. Regards, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andrew Beattie [abeattie at au1.ibm.com] Sent: Thursday, June 28, 2018 10:15 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dancasali at us.ibm.com Thu Jun 28 21:14:51 2018 From: dancasali at us.ibm.com (Daniel De souza casali) Date: Thu, 28 Jun 2018 16:14:51 -0400 Subject: [gpfsug-discuss] Sending logs to Logstash Message-ID: Good Afternoon! Does anyone here in the community send mmfs.log to Logstash? If so what do you use? Thank you! Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Fri Jun 29 16:26:51 2018 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Fri, 29 Jun 2018 15:26:51 +0000 Subject: [gpfsug-discuss] Job vacancy - Senior Research Data Storage Technologist, UCL Message-ID: Dear all, University College London are looking to appoint a Senior Research Data Storage Technologist to join their Research Data Services Team in central London. The role will involve the design and deployment of storage technologies to support research, as well as providing guidance on service development and advising research projects. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is currently developing a new institutional data repository for long-term curation and preservation. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the creation and operation of these services. For further particulars and the application form, please visit https://www.interquestgroup.com/p/join-a-world-class-workforce-at-ucl The application process will be closing shortly: deadline is 1st July 2018. Kind regards Alastair -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services RITS, UCL -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 4 12:21:36 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 11:21:36 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 16:47:01 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 11:47:01 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: Message-ID: Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jun 4 16:59:47 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 4 Jun 2018 15:59:47 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , Message-ID: Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From p.childs at qmul.ac.uk Mon Jun 4 22:26:25 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 4 Jun 2018 21:26:25 +0000 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: References: , , Message-ID: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 4 22:48:45 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 4 Jun 2018 17:48:45 -0400 Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 In-Reply-To: <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> References: , , <5ks9e4reev75kl2f3t6fa2om.1528147569742@email.android.com> Message-ID: Peter, Simon, While I believe Power9 / RHEL 7.5 will be supported with the upcoming PTFs on 4.2.3 and 5.0.1 later in June, I'm working on getting confirmation for that. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Peter Childs To: gpfsug main discussion list Date: 06/04/2018 05:26 PM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Simon Thompson (IT Research Support) wrote ---- Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of knop at us.ibm.com [knop at us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So ? I have another question on support. We?ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel ( https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm ) which is 4.x based. I don?t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Tue Jun 5 12:39:08 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 5 Jun 2018 11:39:08 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Message-ID: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to define different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Wed Jun 6 08:40:07 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 09:40:07 +0200 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes In-Reply-To: References: Message-ID: Hi, when it comes to clusters of this size then 150 nodes per collector rule of thumb is a good way to start. So 3-4 collector nodes should be OK for your setup. The GUI(s) can also be installed on those nodes as well. Collector nodes mainly need a good amount of RAM as all 'current' incoming sensor data is kept there. Local disk is typically not stressed heavily, plain HDD or simple onboard RAID is sufficient, plan for 20-50 GB disc space on each node. For network no special requirements are needed, default should be whatever is used in the cluster anyway. Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: David Johnson To: gpfsug main discussion list Date: 31/05/2018 20:22 Subject: [gpfsug-discuss] recommendations for gpfs 5.x GUI and perf/health monitoring collector nodes Sent by: gpfsug-discuss-bounces at spectrumscale.org We are planning to bring up the new ZIMon tools on our 450+ node cluster, and need to purchase new nodes to run the collector federation and GUI function on. What would you choose as a platform for this? ? memory size? ? local disk space ? SSD? shared? ? net attach ? 10Gig? 25Gig? IB? ? CPU horse power ? single or dual socket? I think I remember somebody in Cambridge UG meeting saying 150 nodes per collector as a rule of thumb, so we?re guessing a federation of 4 nodes would do it. Does this include the GUI host(s) or are those separate? Finally, we?re still using client/server based licensing model, do these nodes count as clients? Thanks, ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Wed Jun 6 09:00:06 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Wed, 6 Jun 2018 10:00:06 +0200 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch> Message-ID: Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Wed Jun 6 10:37:02 2018 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Wed, 6 Jun 2018 09:37:02 +0000 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds In-Reply-To: References: <0081EB235765E14395278B9AE1DF34180A650438@MBX214.d.ethz.ch>, Message-ID: <0081EB235765E14395278B9AE1DF34180A65067E@MBX214.d.ethz.ch> Hi Norbert, thanks a lot, it worked. I tried the same before for the same rules, but it did not work. Now I realized that this was because remaining disk space and metadata was even smaller than when I checked first time, so nothing changed. Thanks a lot for your help, Marc _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Norbert Schuld [NSCHULD at de.ibm.com] Sent: Wednesday, June 06, 2018 10:00 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Hi, assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Mit freundlichen Gr??en / Kind regards Norbert Schuld IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina Koederitz /Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is repo]"Caubet Serrabou Marc (PSI)" ---05/06/2018 13:45:35---Dear all, we have a small cluster which is reporting the following alarm: From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 05/06/2018 13:45 Subject: [gpfsug-discuss] Change pool-[meta]data_high_warn thresholds Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear all, we have a small cluster which is reporting the following alarm: # mmhealth event show pool-metadata_high_warn Event Name: pool-metadata_high_warn Event ID: 999719 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED # mmhealth event show pool-data_high_warn Event Name: pool-data_high_warn Event ID: 999722 Description: The pool reached a warning level. Cause: The pool reached a warning level. User Action: Add more capacity to pool or move data to different pool or delete data and/or snapshots. Severity: WARNING State: DEGRADED Warning threshold for both alarms is 80%, we are at 81%, so alarm is correct. However, I would like to definmmhealth different limits. Is possible to increase it? 'mmhealth thresholds' did not help as these are not supported metrics (unless I am doing something wrong). Another way is to hide this alarm, but I would like to avoid it. Thanks a lot and best regards, _________________________________________ Paul Scherrer Institut High Performance Computing Marc Caubet Serrabou WHGA/019A 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:16:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:16:43 +0000 Subject: [gpfsug-discuss] Capacity pool filling Message-ID: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Jun 7 15:51:49 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 7 Jun 2018 10:51:49 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <6A8A18B6-8578-4C00-A8AC-8A04EF93361F@ulmer.org> > On Jun 7, 2018, at 10:16 AM, Buterbaugh, Kevin L wrote: > > Hi All, > > First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. > > However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) Any files that have been opened in that pool will have a recent atime (you?re moving them there because they have a not-recent atime, so this should be an anomaly). Further, they should have an mtime that is older than 90 days, too. You could ask the policy engine which ones have been open/written in the last day-ish and maybe see a pattern? > 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? > If you are restoring them (as opposed to recalling them), they are different files that happen to have similar contents to some other files. > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 16:08:15 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 17:08:15 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: Hm, RULE 'list_updated_in_capacity_pool' LIST 'updated_in_capacity_pool' FROM POOL 'gpfs23capacity' WHERE CURRENT_TIMESTAMP -MODIFICATION_TIME To: gpfsug main discussion list Date: 07/06/2018 16:25 Subject: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 16:56:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 15:56:34 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: Hi All, So in trying to prove Jaime wrong I proved him half right ? the cron job is stopped: #13 22 * * 5 /root/bin/gpfs_migration.sh However, I took a look in one of the restore directories under /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! So that explains why the capacity pool is filling, but mmlspolicy says: Policy for file system '/dev/gpfs23': Installed by root at gpfsmgr on Wed Jan 25 10:17:01 2017. First line of policy 'gpfs23.policy' is: RULE 'DEFAULT' SET POOL 'gpfs23data' So ? I don?t think GPFS is doing this but the next thing I am going to do is follow up with our tape software vendor ? I bet they preserve the pool attribute on files and - like Jaime said - old stuff is therefore hitting the gpfs23capacity pool. Thanks Jaime and everyone else who has responded so far? Kevin > On Jun 7, 2018, at 9:53 AM, Jaime Pinto wrote: > > I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. > > I also think you may not have stopped the crontab script as you believe you did. > > Jaime > > Quoting "Buterbaugh, Kevin L" : > >> Hi All, >> >> First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? >> >> We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. >> >> However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: >> >> 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) >> >> 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? >> >> Is there a third explanation I?m not thinking of? >> >> Thanks... >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D&reserved=0 > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D&reserved=0 - https://na01.safelinks.protection.outlook.com/?url=www.computecanada.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=JxtEYIN5%2FYiDf3GKa5ZBP3JiC27%2F%2FGiDaRbX5PnWEGU%3D&reserved=0 > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > From pinto at scinet.utoronto.ca Thu Jun 7 15:53:16 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 07 Jun 2018 10:53:16 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: Message-ID: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. I also think you may not have stopped the crontab script as you believe you did. Jaime Quoting "Buterbaugh, Kevin L" : > Hi All, > > First off, I?m on day 8 of dealing with two different > mini-catastrophes at work and am therefore very sleep deprived and > possibly missing something obvious ? with that disclaimer out of the > way? > > We have a filesystem with 3 pools: 1) system (metadata only), 2) > gpfs23data (the default pool if I run mmlspolicy), and 3) > gpfs23capacity (where files with an atime - yes atime - of more than > 90 days get migrated to by a script that runs out of cron each > weekend. > > However ? this morning the free space in the gpfs23capacity pool is > dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot > figure out why. The migration script is NOT running ? in fact, it?s > currently disabled. So I can only think of two possible > explanations for this: > > 1. There are one or more files already in the gpfs23capacity pool > that someone has started updating. Is there a way to check for that > ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but > restricted to only files in the gpfs23capacity pool. Marc Kaplan - > can mmfind do that?? ;-) > > 2. We are doing a large volume of restores right now because one of > the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) > down due to a issue with the storage array. We?re working with the > vendor to try to resolve that but are not optimistic so we have > started doing restores in case they come back and tell us it?s not > recoverable. We did run ?mmfileid? to identify the files that have > one or more blocks on the down NSD, but there are so many that what > we?re doing is actually restoring all the files to an alternate path > (easier for out tape system), then replacing the corrupted files, > then deleting any restores we don?t need. But shouldn?t all of that > be going to the gpfs23data pool? I.e. even if we?re restoring > files that are in the gpfs23capacity pool shouldn?t the fact that > we?re restoring to an alternate path (i.e. not overwriting files > with the tape restores) and the default pool is the gpfs23data pool > mean that nothing is being restored to the gpfs23capacity pool??? > > Is there a third explanation I?m not thinking of? > > Thanks... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 15:45:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 14:45:52 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> References: <68EC0249928AAD56.9D6058B5-0CA1-4A01-BAB3-FF615745B845@mail.outlook.com> Message-ID: <065F97AD-9C82-4B13-A519-E090CD175305@vanderbilt.edu> Hi again all, I received a direct response and am not sure whether that means the sender did not want to be identified, but they asked good questions that I wanted to answer on list? No, we do not use snapshots on this filesystem. No, we?re not using HSM ? our tape backup system is a traditional backup system not named TSM. We?ve created a top level directory in the filesystem called ?RESTORE? and are restoring everything under that ? then doing our moves / deletes of what we?ve restored ? so I *think* that means all of that should be written to the gpfs23data pool?!? On the ?plus? side, I may figure this out myself soon when someone / something starts getting I/O errors! :-O In the meantime, other ideas are much appreciated! Kevin Do you have a job that?s creating snapshots? That?s an easy one to overlook. Not sure if you are using an HSM. Any new file that gets generated should follow the default rule in ILM unless if meets a placement condition. It would only be if you?re using an HSM that files would be placed in a non-placement location pool but that is purely because the the file location has already been updated to the capacity pool. On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" > wrote: Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 7 19:34:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 7 Jun 2018 20:34:16 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: > However, I took a look in one of the restore directories under > /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > So ? I don?t think GPFS is doing this but the next thing I am > going to do is follow up with our tape software vendor ? I bet > they preserve the pool attribute on files and - like Jaime said - > old stuff is therefore hitting the gpfs23capacity pool. Hm, then the backup/restore must be doing very funny things. Usually, GPFS should rule the placement of new files, and I assume that a restore of a file, in particular under a different name, creates a new file. So, if your backup tool does override that GPFS placement, it must be very intimate with Scale :-). I'd do some list scans of the capacity pool just to see what the files appearing there from tape have in common. If it's really that these files' data were on the capacity pool at the last backup, they should not be affected by your dead NSD and a restore is in vain anyway. If that doesn't help or give no clue, then, if the data pool has some more free space, you might try to run an upward/backward migration from capacity to data . And, yeah, as GPFS tends to stripe over all NSDs, all files in data large enough plus some smaller ones would have data on your broken NSD. That's the drawback of parallelization. Maybe you'd ask the storage vendor whether they supply some more storage for the fault of their (redundant?) device to alleviate your current storage shortage ? Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jun 7 20:36:59 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 7 Jun 2018 19:36:59 +0000 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> Message-ID: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 From makaplan at us.ibm.com Thu Jun 7 21:53:36 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jun 2018 16:53:36 -0400 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: If your restore software uses the gpfs_fputattrs() or gpfs_fputattrswithpathname methods, notice there are some options to control the pool. AND there is also the possibility of using the little known "RESTORE" policy rule to algorithmically control the pool selection by different criteria than the SET POOL rule. When all else fails ... Read The Fine Manual ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 06/07/2018 03:37 PM Subject: Re: [gpfsug-discuss] Capacity pool filling Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Uwe, Thanks for your response. So our restore software lays down the metadata first, then the data. While it has no specific knowledge of the extended attributes, it does back them up and restore them. So the only explanation that makes sense to me is that since the inode for the file says that the file should be in the gpfs23capacity pool, the data gets written there. Right now I don?t have time to do an analysis of the ?live? version of a fileset and the ?restored? version of that same fileset to see if the placement of the files matches up. My quick and dirty checks seem to show files getting written to all 3 pools. Unfortunately, we have no way to tell our tape software to ignore files from the gpfs23capacity pool (and we?re aware that we won?t need those files). We?ve also determined that it is actually quicker to tell our tape system to restore all files from a fileset than to take the time to tell it to selectively restore only certain files ? and the same amount of tape would have to be read in either case. Our SysAdmin who is primary on tape backup and restore was going on vacation the latter part of the week, so he decided to be helpful and just queue up all the restores to run one right after the other. We didn?t realize that, so we are solving our disk space issues by slowing down the restores until we can run more instances of the script that replaces the corrupted files and deletes the unneeded restored files. Thanks again? Kevin > On Jun 7, 2018, at 1:34 PM, Uwe Falke wrote: > >> However, I took a look in one of the restore directories under >> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! > > >> So ? I don?t think GPFS is doing this but the next thing I am >> going to do is follow up with our tape software vendor ? I bet >> they preserve the pool attribute on files and - like Jaime said - >> old stuff is therefore hitting the gpfs23capacity pool. > > Hm, then the backup/restore must be doing very funny things. Usually, GPFS > should rule the > placement of new files, and I assume that a restore of a file, in > particular under a different name, > creates a new file. So, if your backup tool does override that GPFS > placement, it must be very > intimate with Scale :-). > I'd do some list scans of the capacity pool just to see what the files > appearing there from tape have in common. > If it's really that these files' data were on the capacity pool at the > last backup, they should not be affected by your dead NSD and a restore is > in vain anyway. > > If that doesn't help or give no clue, then, if the data pool has some more > free space, you might try to run an upward/backward migration from > capacity to data . > > And, yeah, as GPFS tends to stripe over all NSDs, all files in data large > enough plus some smaller ones would have data on your broken NSD. That's > the drawback of parallelization. > Maybe you'd ask the storage vendor whether they supply some more storage > for the fault of their (redundant?) device to alleviate your current > storage shortage ? > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596&sdata=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Fri Jun 8 09:23:18 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Fri, 8 Jun 2018 10:23:18 +0200 Subject: [gpfsug-discuss] Capacity pool filling In-Reply-To: <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> References: <20180607105316.1773228fvnllyr4s@support.scinet.utoronto.ca> <0EA304DC-D318-4BFA-8FF3-36FC9B2A8E44@vanderbilt.edu> Message-ID: Hi Kevin, gpfsug-discuss-bounces at spectrumscale.org wrote on 07/06/2018 21:36:59: > From: "Buterbaugh, Kevin L" > So our restore software lays down the metadata first, then the data. > While it has no specific knowledge of the extended attributes, it > does back them up and restore them. So the only explanation that > makes sense to me is that since the inode for the file says that the > file should be in the gpfs23capacity pool, the data gets written there. Hm, fair enough. So it seems to extract and revise information from the inodes of backed-up files (since it cannot reuse the inode number, it must do so ...). Then, you could ask your SW vendor to include a functionality like "restore using GPFS placement" (ignoring pool info from inode), "restore data to pool XY" (all data restored,, but all to pool XY) or "restore only data from pool XY" (only data originally backed up from XY, and restored to XY), and LBNL "restore only data from pool XY to pool ZZ". The tapes could still do streaming reads, but all files not matching the condition would be ignored. Is a bit more sophisticated than just copying the inode content except some fields such as inode number. OTOH, how often are restores really needed ... so it might be over the top ... > > We?ve also determined that it is actually quicker to tell > our tape system to restore all files from a fileset than to take the > time to tell it to selectively restore only certain files ? and the > same amount of tape would have to be read in either case. Given that you know where the restored files are going to in the file system, you can also craft a policy that deletes all files which are in pool Capacity and have a path into the restore area. Running that every hour should keep your capacity pool from filling up. Just the tapes need to read more, but because they do it in streaming mode, it is probably not more expensive than shoe-shining. And that could also be applied to the third data pool which also retrieves files. But maybe your script is also sufficient Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From secretary at gpfsug.org Fri Jun 8 09:53:43 2018 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Fri, 08 Jun 2018 09:53:43 +0100 Subject: [gpfsug-discuss] Committee change Message-ID: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Jun 8 11:42:55 2018 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Jun 2018 11:42:55 +0100 Subject: [gpfsug-discuss] Committee change In-Reply-To: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: on behalf of "secretary at gpfsug.org" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list Cc: "secretary at spectrumscaleug.org" , Chair Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From colinb at mellanox.com Fri Jun 8 12:18:25 2018 From: colinb at mellanox.com (Colin Bridger) Date: Fri, 8 Jun 2018 11:18:25 +0000 Subject: [gpfsug-discuss] Committee change In-Reply-To: References: <29156faf86b6a83e9a43d0859ab68713@webmail.gpfsug.org> Message-ID: I?d also like to wish Claire all the best as well. As a sponsor for a large number of the events, she has been so organized and easy to work with ?and arranged some great after events? so thank you! Tongue firmly in cheek, I?d also like to agree with Claire on the 1 key thing she has learnt and point her towards the Chair of Spectrum-Scale UG for his solution ? All the best Claire! Colin Colin Bridger Mellanox Technologies Mobile: +44 7917 017737 Email: colinb at mellanox.com From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Spectrum Scale User Group Chair) Sent: Friday, June 8, 2018 11:43 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Committee change On behalf of the group, I?d like to thank Claire for her support of the group over the past 8 years and wish her well in the new role! Its grown from a few people round a table to a worldwide group with hundreds of members. I spoke with Claire yesterday, and she said the 1 key thing she has learnt about Spectrum Scale is that any issues are likely your network ? Simon Group Chair From: > on behalf of "secretary at gpfsug.org" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 8 June 2018 at 09:53 To: gpfsug main discussion list > Cc: "secretary at spectrumscaleug.org" >, Chair > Subject: [gpfsug-discuss] Committee change Dear members, We have a change to the committee that we'd like to share with you. After almost 8 years as Secretary of the group, I am leaving my 'day job' at OCF to join a new company. With that, I am passing over the SSUG secretary role to Sharee Pickles. Things will continue as usual with Sharee picking up responsibility for communicating to the group about meetings and helping to organise and run them alongside Simon the group Chairperson. This will still be through the secretary at spectrumscaleug.org email address. I'd like to thank you all for making my time working with the group so enjoyable and I look forward to seeing more progress in the future. I'd also like to thank the other committee members Simon, Bob and Kristy for their continuing support of the group and I'm sure you will all welcome Sharee into the role. Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:46:26 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:46:26 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 11 11:49:46 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 11 Jun 2018 10:49:46 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Jun 11 11:59:11 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 10:59:11 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jun 11 12:52:25 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 11 Jun 2018 07:52:25 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Mon Jun 11 12:56:43 2018 From: Philipp.Rehs at uni-duesseldorf.de (Philipp Helo Rehs) Date: Mon, 11 Jun 2018 13:56:43 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector Message-ID: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further?? Kind regards ?Philipp Rehs -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1786 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jun 11 13:17:04 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 11 Jun 2018 12:17:04 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Thanks Fred. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 11 June 2018 12:52 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Mon Jun 11 13:46:10 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Mon, 11 Jun 2018 14:46:10 +0200 Subject: [gpfsug-discuss] GPFS-GUI and Collector In-Reply-To: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> References: <72927db9-8cc2-52e4-75ff-702febb0b5a8@uni-duesseldorf.de> Message-ID: Hello, there could be several reasons why data is not shown in the GUI. There are some knobs in the performance data collection that could prevent it. Some common things to check: 1 Are you getting data at all? Some nodes missing? Check with the CLI and expect data: mmperfmon query compareNodes cpu_user -b 3600 -n 2 Legend: ?1:???? cache-11.novalocal|CPU|cpu_user ?2:???? cache-12.novalocal|CPU|cpu_user ?3:???? cache-13.novalocal|CPU|cpu_user Row?????????? Timestamp cache-11 cache-12 cache-13 ? 1 2018-06-11-14:00:00 1.260611 9.447619 4.134019 ? 2 2018-06-11-15:00:00 1.306165 9.026577 4.062405 2. Are specific nodes missing? Check communications between sensors and collectors. 3. Is specific data missing? For Capacity date see here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guicapacityinfoissue.htm 4. How does the sensor config look like? Call mmperfmon config show Can all sensors talk to the collector registered as colCandidates? colCandidates = "cache-11.novalocal" colRedundancy = 1 You can also contact me by PN. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Philipp Helo Rehs To: gpfsug-discuss at spectrumscale.org Date: 11.06.2018 14:05 Subject: [gpfsug-discuss] GPFS-GUI and Collector Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, we have GPFS-GUI and Clients at 4.2.3.7 and my clients to not show any performance data in the gui. All clients are running pmsensor and the gui is running pmcollector. I can see in tcpdump that the server receives data but i can not see in the the gui. " Performance collector did not return any data. " Do you have any idea how i can debug it further? Kind regards ?Philipp Rehs (See attached file: pEpkey.asc) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19393134.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/octet-stream Size: 1817 bytes Desc: not available URL: From ulmer at ulmer.org Mon Jun 11 13:47:58 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 11 Jun 2018 08:47:58 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: <1EF433B1-14DD-48AB-B1B4-07EF88E48EDF@ulmer.org> So is it better to pin with the subscription manager, or in our case to pin the kernel version with yum (because you always have something to do when the kernel changes)? What is the consensus? -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Jun 11, 2018, at 6:59 AM, Sobey, Richard A wrote: > > Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: > > [root@ ~]# subscription-manager release > Release: 7.4 > [root@ ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. > > Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! > > Cheers > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) > Sent: 11 June 2018 11:50 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > We have on our DSS-G ? > > Have you looked at: > https://access.redhat.com/solutions/238533 > > ? > > Simon > > From: on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 11 June 2018 at 11:46 > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 > > Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? > > Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Jun 11 14:52:16 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 11 Jun 2018 09:52:16 -0400 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From JRLang at uwyo.edu Mon Jun 11 16:01:48 2018 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Mon, 11 Jun 2018 15:01:48 +0000 Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 In-Reply-To: References: Message-ID: Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jun 12 11:42:32 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 12 Jun 2018 10:42:32 +0000 Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel@qmul.ac.uk> We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London From truongv at us.ibm.com Tue Jun 12 14:53:15 2018 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 12 Jun 2018 09:53:15 -0400 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: References: Message-ID: Yes, older versions of GPFS don't recognize /dev/nvme*. So you would need /var/mmfs/etc/nsddevices user exit. On newer GPFS versions, the nvme devices are also generic. So, it is good that you are using the same NSD sub-type. Cheers, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 06/12/2018 06:47 AM Subject: gpfsug-discuss Digest, Vol 77, Issue 15 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: RHEL updated to 7.5 instead of 7.4 (Felipe Knop) 2. Re: RHEL updated to 7.5 instead of 7.4 (Jeffrey R. Lang) 3. Lroc on NVME (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Mon, 11 Jun 2018 09:52:16 -0400 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="utf-8" Fred, Correct. The FAQ should be updated shortly. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Frederick Stock" To: gpfsug main discussion list Date: 06/11/2018 07:52 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Spectrum Scale 4.2.3.9 does support RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Sobey, Richard A" To: gpfsug main discussion list Date: 06/11/2018 06:59 AM Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Simon. Do you mean you pinned the minor release to 7.X but yum upgraded you to 7.Y? This has just happened to me: [root@ ~]# subscription-manager release Release: 7.4 [root@ ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) Granted I didn?t issue a yum clean all after changing the release however I?ve never seen this happen before. Anyway, I need to either downgrade back to 7.4 or upgrade GPFS, whichever will be the best supported. I need to learn to pay attention to what kernel version I?m being updated to in future! Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson (IT Research Support) Sent: 11 June 2018 11:50 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 We have on our DSS-G ? Have you looked at: https://access.redhat.com/solutions/238533 ? Simon From: on behalf of "Sobey, Richard A" Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Monday, 11 June 2018 at 11:46 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/d13470c2/attachment-0001.gif > ------------------------------ Message: 2 Date: Mon, 11 Jun 2018 15:01:48 +0000 From: "Jeffrey R. Lang" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Message-ID: Content-Type: text/plain; charset="us-ascii" Yes, I recently had this happen. It was determined that the caches had been updated to the 7.5 packages, before I set the release to 7.4/ Since I didn't clear and delete the cache it used what it had and did the update to 7.5. So always clear and remove the cache before an update. Jeff From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sobey, Richard A Sent: Monday, June 11, 2018 5:46 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4 Has anyone ever used subscription-manager to set a release to 7.4 only for the system to upgrade to 7.5 anyway? Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on downgrading back to 7.4? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180611/f085e78e/attachment-0001.html > ------------------------------ Message: 3 Date: Tue, 12 Jun 2018 10:42:32 +0000 From: Peter Childs To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Lroc on NVME Message-ID: <687c534347c7e02365cb3c5de4532a60f8a296fb.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 15 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Tue Jun 12 23:25:53 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Tue, 12 Jun 2018 22:25:53 +0000 Subject: [gpfsug-discuss] Lroc on NVME In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB0839DFD827D68f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Jun 13 10:10:28 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 11:10:28 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Hello, did anyone encountered an error with RHEL 7.5 kernel 3.10.0-862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? I'm getting random errors: Unknown error 521. It means EBADHANDLE. Not sure whether it is due to kernel or GPFS. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:32:44 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:32:44 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> Message-ID: <1528882364.26036.3.camel@strath.ac.uk> On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From r.sobey at imperial.ac.uk Wed Jun 13 10:33:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jun 2018 09:33:49 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet however. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Jun 13 10:37:56 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 13 Jun 2018 10:37:56 +0100 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: <1528882676.26036.4.camel@strath.ac.uk> On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From TOMP at il.ibm.com Wed Jun 13 10:48:14 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 12:48:14 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882676.26036.4.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:07:52 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 13 Jun 2018 06:07:52 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson > On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > To: gpfsug main discussion list > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Jun 13 11:11:26 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 13 Jun 2018 12:11:26 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> Message-ID: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From TOMP at il.ibm.com Wed Jun 13 11:32:28 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 13:32:28 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: Hi, :-) I explicitly used the term "same environment". The simple answer would be NO, but: While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: david_johnson at brown.edu To: gpfsug main discussion list Date: 13/06/2018 13:13 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? -- ddj Dave Johnson On Jun 13, 2018, at 5:48 AM, Tomer Perry wrote: knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). knfs and cNFS can't coexist with CES in the same environment. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jonathan Buzzard To: gpfsug main discussion list Date: 13/06/2018 12:38 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > however. > Then we are down to kernel NFS not been supported then? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Jun 13 11:38:37 2018 From: david_johnson at brown.edu (David D Johnson) Date: Wed, 13 Jun 2018 06:38:37 -0400 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> <1528882676.26036.4.camel@strath.ac.uk> <68073099-AAFF-40AC-BAF8-6B3265E94008@brown.edu> Message-ID: So first, apologies for hijacking the thread, but this is a hot issue as we are planning 4.2.x to 5.x.y upgrade in the unspecified future, and are currently running CNFS and clustered CIFS. Those exporter nodes are in need of replacement, and I am unsure of the future status of CNFS and CIFS (are they even in 5.x?). Is there a way to roll out protocols while still offering CNFS/Clustered CIFS, and cut over when it's ready for prime time? > On Jun 13, 2018, at 6:32 AM, Tomer Perry wrote: > > Hi, > > :-) I explicitly used the term "same environment". > > The simple answer would be NO, but: > While the code will only enforce not configuring CES and CNFS on the same cluster - it wouldn't know to do that between clusters - so I don't believe anything will prevent you from configuring it. > That said, there might be implications on recovery that might lead to data corruption ( imagine two systems that don't know about the other locks for the reclaim process). > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: david_johnson at brown.edu > To: gpfsug main discussion list > Date: 13/06/2018 13:13 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Could you expand on what is meant by ?same environment?? Can I export the same fs in one cluster from cnfs nodes (ie NSD cluster with no root squash) and also export the same fs in another cluster (client cluster with root squash) using Ganesha? > > -- ddj > Dave Johnson > > On Jun 13, 2018, at 5:48 AM, Tomer Perry > wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add HA to NFS on top of GPFS - https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm ). > > knfs and cNFS can't coexist with CES in the same environment. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Jonathan Buzzard > > To: gpfsug main discussion list > > Date: 13/06/2018 12:38 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, 2018-06-13 at 09:33 +0000, Sobey, Richard A wrote: > > 4.2.3.9 is supported on RHEL 7.5, the FAQ has not been updated yet > > however. > > > > Then we are down to kernel NFS not been supported then? > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Jun 13 15:45:44 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 13 Jun 2018 17:45:44 +0300 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz><1528882364.26036.3.camel@strath.ac.uk><1528882676.26036.4.camel@strath.ac.uk> <20180613101126.ngk2kkcpedwewghl@ics.muni.cz> Message-ID: Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed Jun 13 16:14:53 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 13 Jun 2018 15:14:53 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 In-Reply-To: <1528882364.26036.3.camel@strath.ac.uk> References: <20180613091028.35sd7t5zrjws6nov@ics.muni.cz> <1528882364.26036.3.camel@strath.ac.uk> Message-ID: We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pprandive at rediffmail.com Thu Jun 14 15:22:09 2018 From: pprandive at rediffmail.com (Prafulla) Date: 14 Jun 2018 14:22:09 -0000 Subject: [gpfsug-discuss] =?utf-8?q?GPFS_support_for_latest_stable_release?= =?utf-8?q?_of_OpenStack_=28called_Queens_https=3A//www=2Eopenstack?= =?utf-8?q?=2Eorg/software/queens/=29?= Message-ID: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens?I have few queries around that,1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)?2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose?Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance!Regards,pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jun 14 15:56:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 14 Jun 2018 14:56:28 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: <7282ECEB-75F0-45AF-A36C-57D3B5930CBA@bham.ac.uk> That probably depends on your definition of support? Object as part of the CES stack is currently on Pike (AFAIK). If you wanted to run swift and Queens then I don?t think that would be supported as part of CES. I believe that cinder/manilla/glance integration is written by IBM developers, but I?m not sure if there was ever a formal support statement from IBM about this, (in the sense of a guaranteed support with a PMR). Simon From: on behalf of "pprandive at rediffmail.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 14 June 2018 at 15:49 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Thu Jun 14 16:04:00 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Thu, 14 Jun 2018 11:04:00 -0400 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: Brian is probably best able to answer this question. Lyle From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.b.mills at nasa.gov Thu Jun 14 16:09:57 2018 From: jonathan.b.mills at nasa.gov (Mills, Jonathan B. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 14 Jun 2018 15:09:57 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: I can?t speak for the GUI integration with Horizon, but I use GPFS 4.2.3.8 just fine with OpenStack Pike (for Glance, Cinder, and Nova). I?d be surprised if it worked any differently in Queens. From: on behalf of Lyle Gayne Reply-To: gpfsug main discussion list Date: Thursday, June 14, 2018 at 11:05 AM To: gpfsug main discussion list Cc: Brian Nelson Subject: Re: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Brian is probably best able to answer this question. Lyle [Inactive hide details for "Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the l]"Prafulla" ---06/14/2018 11:01:19 AM---Hello Guys,Greetings!Could you please help me figure out the level of GPFS's support for latest From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From brnelson at us.ibm.com Fri Jun 15 04:36:19 2018 From: brnelson at us.ibm.com (Brian Nelson) Date: Thu, 14 Jun 2018 22:36:19 -0500 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: The only OpenStack component that GPFS explicitly ships is Swift, which is used for the Object protocol of the Protocols Support capability. The latest version included is Swift at the Pike release. That was first made available in the GPFS 5.0.1.0 release. The other way that GPFS can be used is as the backing store for many OpenStack components, as you can see in this table: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1ins_openstackusecase.htm The GPFS drivers for those components were written in the Liberty/Mitaka timeframe. We generally do not certify every OpenStack release against GPFS. However, we have not had any compatibility issues with later releases, and I would expect Queens to also work fine with GPFS storage. -Brian =================================== Brian Nelson 512-286-7735 (T/L) 363-7735 IBM Spectrum Scale brnelson at us.ibm.com From: "Prafulla" To: Date: 06/14/2018 11:01 AM Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Guys, Greetings! Could you please help me figure out the level of GPFS's support for latest stable release of OpenStack which is called as Queens? I have few queries around that, 1. Does GPFS support Queens release for OpenStack Cinder, Glance, Manilla and Swift services? If yes, which GPFS release(s)? 2. If I would need to integrate GPFS GUI with OpenStack Horizon (the dashboard service), can it be done? If yes, how can I do that? Does GPFS GUI provides any such APIs which could be used for this integration purpose? Guys, request you kindly help me find answer to these queries. Your help will much appreciated. Thanks in advance! Regards, pR_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From cabrillo at ifca.unican.es Fri Jun 15 13:01:07 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 14:01:07 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Jun 15 15:49:14 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 15 Jun 2018 14:49:14 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Fri Jun 15 16:16:18 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Fri, 15 Jun 2018 17:16:18 +0200 (CEST) Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> Message-ID: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 0 this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Fri Jun 15 16:23:43 2018 From: chrisjscott at gmail.com (Chris Scott) Date: Fri, 15 Jun 2018 16:23:43 +0100 Subject: [gpfsug-discuss] Employment vacancy: Research Computing Specialist at University of Dundee, Scotland Message-ID: Hi All This is an employment opportunity to work with Spectrum Scale and its integration features with Spectrum Protect. Please see or forward along the following link to an employment vacancy in my team for a Research Computing Specialist here at the University of Dundee: https://www.jobs.dundee.ac.uk/fe/tpl_uod01.asp?s=4A515F4E5A565B1A&jobid=102157,4132345688&key=135360005&c=54715623342377&pagestamp=sepirmfpbecljxwhkl Cheers Chris [image: University of Dundee shield logo] *Chris Scott* Research Computing Manager School of Life Sciences, UoD IT, University of Dundee +44 (0)1382 386250 | C.Y.Scott at dundee.ac.uk [image: University of Dundee Facebook] [image: University of Dundee Twitter] [image: University of Dundee LinkedIn] [image: University of Dundee YouTube] [image: University of Dundee Instagram] [image: University of Dundee Snapchat] *One of the world's top 200 universities* Times Higher Education World University Rankings 2018 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Jun 15 16:25:50 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 15 Jun 2018 11:25:50 -0400 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections In-Reply-To: <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> References: <1546731929.925865.1529064067033.JavaMail.zimbra@ifca.unican.es> <1657031698.932123.1529075778659.JavaMail.zimbra@ifca.unican.es> Message-ID: Assuming CentOS 7.5 parallels RHEL 7.5 then you would need Spectrum Scale 4.2.3.9 because that is the release version (along with 5.0.1 PTF1) that supports RHEL 7.5. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Iban Cabrillo To: gpfsug-discuss Date: 06/15/2018 11:16 AM Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Anderson, Comments are in line From: "Anderson Ferreira Nobre" To: "gpfsug-discuss" Cc: "gpfsug-discuss" Sent: Friday, 15 June, 2018 16:49:14 Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Hi Iban, I think it's necessary more information to be able to help you. Here they are: - Redhat version: Which is 7.2, 7.3 or 7.4? CentOS Linux release 7.5.1804 (Core) - Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels - Platform: Is it x86_64? Yes it is - Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1? No, that wasthe default version we get from our costumer we could upgrade to 4.2.3-9 with time... - How is the name resolution? Can you do test ping from one node to another and it's reverse? yes resolution works fine in both directions (there is no firewall or icmp filter) using ethernet private network (not IB) - TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following: [root at XXXX sysctl.d]# cat 99-ibmscale.conf net.core.somaxconn = 10000 net.core.netdev_max_backlog = 250000 net.ipv4.ip_local_port_range = 2000 65535 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 10 net.core.rmem_default = 4194304 net.core.rmem_max = 4194304 net.core.wmem_default = 4194304 net.core.wmem_max = 4194304 net.core.optmem_max = 4194304 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 vm.min_free_kbytes = 512000 kernel.panic_on_oops = 0 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 vm.swappiness = 0 vm.dirty_ratio = 10 That is mine: net.ipv4.conf.default.accept_source_route = 0 net.core.somaxconn = 8192 net.ipv4.tcp_fin_timeout = 30 kernel.sysrq = 1 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 13491064832 kernel.shmall = 4294967296 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.tcp_synack_retries = 10 net.ipv4.tcp_sack = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.core.netdev_max_backlog = 250000 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.neigh.default.gc_thresh1 = 30000 net.ipv4.neigh.default.gc_thresh2 = 32000 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.ucast_solicit = 9 net.ipv6.neigh.enp3s0.mcast_solicit = 9 net.ipv4.neigh.ib0.mcast_solicit = 18 vm.oom_dump_tasks = 1 vm.min_free_kbytes = 524288 Since we disabled ipv6, we had to rebuild the kernel image with the following command: [root at XXXX ~]# dracut -f -v I did that on Wns but no on GPFS servers... - GPFS tuning parameters: Can you list them? - Spectrum Scale status: Can you send the following outputs: mmgetstate -a -L mmlscluster [root at gpfs01 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: gpfsgui.ifca.es GPFS cluster id: 8574383285738337182 GPFS UID domain: gpfsgui.ifca.es Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon 9 cloudprv-02-9.ifca.es 10.10.140.26 cloudprv-02-9.ifca.es 10 cloudprv-02-8.ifca.es 10.10.140.25 cloudprv-02-8.ifca.es 13 node1.ifca.es 10.10.151.3 node3.ifca.es ...... 44 node24.ifca.es 10.10.151.24 node24.ifca.es ..... mmhealth cluster show (It was shoutdown by hand) [root at gpfs01 ~]# mmhealth cluster show --verbose Error: The monitoring service is down and does not respond, please restart it. mmhealth cluster show --verbose mmhealth node eventlog 2018-06-12 23:31:31.487471 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-12 23:31:52.856082 CET ccr_local_server_ok INFO The local GPFS CCR server is reachable PC_LOCAL_SERVER 2018-06-12 23:33:06.397125 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-12 23:33:06.400622 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-12 23:33:06.787556 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-12 23:33:22.670023 CET quorum_up INFO Quorum achieved 2018-06-13 14:01:51.376885 CET service_removed INFO On the node gpfs01.ifca.es the threshold monitor was removed 2018-06-13 14:01:51.385115 CET service_removed INFO On the node gpfs01.ifca.es the perfmon monitor was removed 2018-06-13 18:41:55.846893 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-13 18:42:39.217545 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-13 18:42:39.221455 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-13 18:42:39.653778 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-13 18:42:55.956125 CET quorum_up INFO Quorum achieved 2018-06-13 18:43:17.448980 CET service_running INFO The service perfmon is running on node gpfs01.ifca.es 2018-06-13 18:51:14.157351 CET service_running INFO The service threshold is running on node gpfs01.ifca.es 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized 2018-06-14 08:04:30.216689 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-14 08:05:10.836900 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-14 08:05:27.135275 CET quorum_up INFO Quorum achieved 2018-06-14 08:05:40.446601 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-14 08:05:40.881064 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-14 08:08:56.455851 CET ib_rdma_nic_recognized INFO IB RDMA NIC mlx5_0/1 was recognized 2018-06-14 12:29:58.772033 CET ccr_quorum_nodes_warn WARNING At least one quorum node is not reachable Item=PC_QUORUM_NODES,ErrMsg='Ping CCR quorum nodes failed',Failed='10.10.0.112' 2018-06-14 15:41:57.860925 CET ccr_quorum_nodes_ok INFO All quorum nodes are reachable PC_QUORUM_NODES 2018-06-15 13:04:41.403505 CET pmcollector_down ERROR pmcollector service should be started and is stopped 2018-06-15 15:23:00.121760 CET quorum_down ERROR The node is not able to form a quorum with the other available nodes. 2018-06-15 15:23:43.616075 CET fs_remount_mount INFO The filesystem gpfs was mounted internal 2018-06-15 15:23:43.619593 CET fs_remount_mount INFO The filesystem gpfs was mounted remount 2018-06-15 15:23:44.053493 CET mounted_fs_check INFO The filesystem gpfs is mounted 2018-06-15 15:24:00.219003 CET quorum_up INFO Quorum achieved [root at gpfs02 ~]# mmhealth node eventlog Error: The monitoring service is down and does not respond, please restart it. mmlsnode -L -N waiters non default parameters: [root at gpfs01 ~]# mmdiag --config | grep ! ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 8574383285738337182 ! clusterName gpfsgui.ifca.es ! dmapiFileHandleSize 32 ! idleSocketTimeout 0 ! ignorePrefetchLUNCount 1 ! maxblocksize 16777216 ! maxFilesToCache 4000 ! maxInodeDeallocPrefetch 64 ! maxMBpS 6000 ! maxStatCache 512 ! minReleaseLevel 1700 ! myNodeConfigNumber 1 ! pagepool 17179869184 ! socketMaxListenConnections 512 ! socketRcvBufferSize 131072 ! socketSndBufferSize 65536 ! verbsPorts mlx5_0/1 ! verbsRdma enable ! worker1Threads 256 Regards, I Abra?os / Regards / Saludos, Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services Phone: 55-19-2132-4317 E-mail: anobre at br.ibm.com ----- Original message ----- From: Iban Cabrillo Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Date: Fri, Jun 15, 2018 9:12 AM Dear, We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7. We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT : Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon 2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon 3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon ....... Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range: [root at gpfs01 ~]# netstat -putana | grep 1191 | wc -l 19701 [root at gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l 19528 .... [root at gpfs02 ~]# netstat -putana | grep 1191 | wc -l 215 [root at gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l this is causing that gpfs01 do not answer to cluster commands NSD are balance between server (same size): [root at gpfs02 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs nsd1 gpfs01,gpfs02 gpfs nsd2 gpfs01,gpfs02 gpfs nsd3 gpfs02,gpfs01 gpfs nsd4 gpfs02,gpfs01 ..... proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2 gpfs01 ####### root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641 root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd root 19701 2 0 jun14 ? 00:00:00 [mmkproc] root 19702 2 0 jun14 ? 00:00:00 [mmkproc] root 19703 2 0 jun14 ? 00:00:00 [mmkproc] root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs [root at gpfs02 ~]# ps -feA | grep mm root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd root 5255 2 0 jun14 ? 00:00:00 [mmkproc] root 5256 2 0 jun14 ? 00:00:00 [mmkproc] root 5257 2 0 jun14 ? 00:00:00 [mmkproc] root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs Any idea will be appreciated. Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jun 15 17:17:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 15 Jun 2018 16:17:48 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections Message-ID: <4D6C04F4-266A-47AC-BC9A-C0CA9AA2B123@bham.ac.uk> This: ?2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized? Looks like you are telling GPFS to use an MLX card that doesn?t exist on the node, this is set with verbsPorts, it?s probably not your issue here, but you are better using nodeclasses and assigning the config option to those nodeclasses that have the correct card installed (I?d also encourage you to use a fabric number, we do this even if there is only 1 fabric currently in the cluster as we?ve added other fabrics over time or over multiple locations). Have you tried using mmnetverify at all? It?s been getting better in the newer releases and will give you a good indication if you have a comms issue due to something like name resolution in addition to testing between nodes? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 15 June 2018 at 16:16 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Thousands of CLOSE_WAIT connections 2018-06-14 08:04:06.341564 CET ib_rdma_nic_unrecognized ERROR IB RDMA NIC mlx5_0/1 was not recognized -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Jun 18 11:43:38 2018 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 18 Jun 2018 12:43:38 +0200 Subject: [gpfsug-discuss] Fw: User Group Meeting at ISC2018 Frankfurt - Agenda Update Message-ID: Greetings: Here is the refined agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. If not yet done - please register here to attend so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES Looking forward to see you there, Ulf Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 What is new in ESS? Christopher Maestas (IBM) 15:00-15:15 High Capacity File Storage Oliver Kill (pro-com) 15:15-15:35 Site Report: CSCS Stefano Gorini (CSCS) 15:35-15:55 Site Report: University of Birmingham Simon Thompson (University of Birmingham) 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Deep Dive on one Spectrum Scale Feature Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 18.06.2018 12:33 ----- From: Ulf Troppens/Germany/IBM To: gpfsug-discuss at spectrumscale.org Date: 28.05.2018 09:59 Subject: User Group Meeting at ISC2018 Frankfurt Greetings: IBM is happy to announce the agenda for the joint "IBM Spectrum Scale and IBM Spectrum LSF User Group Meeting" at ISC in Frankfurt, Germany. We will finish on time to attend the opening reception. As with other user group meetings, the agenda includes user stories, updates on IBM Spectrum Scale & IBM Spectrum LSF, and access to IBM experts and your peers. Please join us! To attend please register here so that we can have an accurate count of attendees: https://www-01.ibm.com/events/wwe/grp/grp308.nsf/Registration.xsp?openform&seminar=AA4A99ES We are still looking for two customers to talk about their experience with Spectrum Scale and/or Spectrum LSF. Please send me a personal mail, if you are interested to talk. Monday June 25th, 2018 - 14:00-17:30 - Conference Room Applaus 14:00-14:15 Welcome Gabor Samu (IBM) / Ulf Troppens (IBM) 14:15-14:45 What is new in Spectrum Scale? Mathias Dietz (IBM) 14:45-15:00 News from Lenovo Storage Michael Hennicke (Lenovo) 15:00-15:15 What is new in ESS? Christopher Maestas (IBM) 15:15-15:35 Customer talk 1 TBD 15:35-15:55 Customer talk 2 TBD 15:55-16:25 What is new in Spectrum Computing? Bill McMillan (IBM) 16:25-16:55 Field Update Olaf Weiser (IBM) 16:55-17:25 Spectrum Scale enhancements for CORAL Sven Oehme (IBM) 17:25-17:30 Wrap-up Gabor Samu (IBM) / Ulf Troppens (IBM) Looking forward to see some of you there. Best, Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From PPOD at de.ibm.com Mon Jun 18 14:59:16 2018 From: PPOD at de.ibm.com (Przemyslaw Podfigurny1) Date: Mon, 18 Jun 2018 13:59:16 +0000 Subject: [gpfsug-discuss] GPFS support for latest stable release of OpenStack (called Queens https://www.openstack.org/software/queens/) In-Reply-To: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> References: <20180614142209.21301.qmail@f4mail-235-200.rediffmail.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043380.png Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043381.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15293152043382.png Type: image/png Size: 1167 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 16:53:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 15:53:51 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Mon Jun 18 17:05:35 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Mon, 18 Jun 2018 16:05:35 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment * Red Hat Enterprise Linux 5 * Red Hat Enterprise Linux 6 * Red Hat Enterprise Linux 7 * NFSv4 share being exported from an NFSv4 capable NFS server Issue * From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. * Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: * How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution * Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com * Note: * If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. * If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. * On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. * Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. * On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. * Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping * If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. * With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ * Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause * NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit * The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR|NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data->auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } * This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap->idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C * For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps * Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" * The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' * Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & * If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). * From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From chetkulk at in.ibm.com Mon Jun 18 17:20:29 2018 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Mon, 18 Jun 2018 21:50:29 +0530 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution In-Reply-To: References: Message-ID: Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com client: e.g. RHEL NFS client; set Domain attribute in /etc/idmapd.conf file and restart idmap service. # egrep ^Domain /etc/idmapd.conf Domain = test.com # service nfs-idmap restart reference Link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/b1ladm_authconsidfornfsv4access.htm Thanks, Chetan. From: "Wilson, Neil" To: gpfsug main discussion list Date: 06/18/2018 09:35 PM Subject: Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Sent by: gpfsug-discuss-bounces at spectrumscale.org I think it?s caused by the ID mapping not being configured properly. Found this on the redhat knowledge base. Environment Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterprise Linux 7 NFSv4 share being exported from an NFSv4 capable NFS server Issue From the client, the mounted NFSv4 share has ownership for all files and directories listed as nobody:nobody instead of the actual user that owns them on the NFSv4 server, or who created the new file and directory. Seeing nobody:nobody permissions on nfsv4 shares on the nfs client. Also seeing the following error in /var/log/messages: How to configure Idmapping for NFSv4 Raw nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Resolution Modify the /etc/idmapd.conf with the proper domain (FQDN), on both the client and server. In this example, the proper domain is "example.com" so the "Domain =" directive within /etc/idmapd.conf should be modified to read: Raw Domain = example.com Note: If using a NetApp Filer, the NFS.V4.ID.DOMAIN parameter must be set to match the "Domain =" parameter on the client. If using a Solaris machine as the NFS server, the NFSMAPID_DOMAIN value in /etc/default/nfs must match the RHEL clients Domain. On Red Hat Enterprise Linux 6.2 and older, to put the changes into effect restart the rpcidmapd service and remount the NFSv4 filesystem : Raw # service rpcidmapd restart # mount -o remount /nfs/mnt/point NOTE: It is only necessary to restart rpc.idmapd service on systems where rpc.idmapd is actually performing the id mapping. On RHEL 6.3 and newer NFS CLIENTS, the maps are stored in the kernel keyring and the id mapping itself is performed by the /sbin/nfsidmap program. On older NFS CLIENTS (RHEL 6.2 and older) as well as on all NFS SERVERS running RHEL, the id mapping is performed by rpc.idmapd. Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. On Red Hat Enterprise Linux 6.3 and higher, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: Raw # nfsidmap -c NOTE: The above command is only necessary on systems that use the keyring-based id mapper, i.e. NFS CLIENTS running RHEL 6.3 and higher. On RHEL 6.2 and older NFS CLIENTS as well as all NFS SERVERS running RHEL, the cache should be cleared out when rpc.idmapd is restarted. Another check, see if the passwd:, shadow: and group: settings are set correctly in the /etc/nsswitch.conf file on both Server and Client. Disabling idmapping NOTE: In order to properly disable idmapping, it must be disabled on both the NFS client and NFS server. - By default, RHEL6.3 and newer NFS clients and servers disable idmapping when utilizing the AUTH_SYS/UNIX authentication flavor by enabling the following booleans: Raw NFS client # echo 'Y' > /sys/module/nfs/parameters/nfs4_disable_idmapping NFS server # echo 'Y' > /sys/module/nfsd/parameters/nfs4_disable_idmapping If using a NetApp filer, the options nfs.v4.id.allow_numerics on command can be used to disable idmapping. More information can be found here. With this boolean enabled, NFS clients will instead send numeric UID/GID numbers in outgoing attribute calls and NFS servers will send numeric UID/GID numbers in outgoing attribute replies. ? If NFS clients sending numeric UID/GID values in a SETATTR call receive an NFS4ERR_BADOWNER reply from the NFS server clients will re-enable idmapping and send user at domain strings for that specific mount from that point forward. ? We can make the option nfs4_disable_idmapping persistent across reboot. ? After the above value has been changed, for the setting to take effect for any NFS server export mounted on the NFS client, you must unmount all NFS mount points for the given NFS server, and then re-mount them. If you have auto mounts stop all processes accessing the mounts and allow the automount daemon to unmount them. Once all NFS mount points are gone to the desired NFS server, remount the NFS mount points and the new setting should be in place. If this is too problematic, you may want to schedule a reboot of the NFS client. ? To verify the setting has been changed properly, you can look inside the /proc/self/mountstats file 'caps' line, which contains a hex value of 2 bytes (16 bits). This is the line that shows the NFS server's "capabilities", and the most significant bit #15 is the one which represents whether idmapping is disabled or not (the NFS_CAP_UIDGID_NOMAP bit - see the Root Cause section) Raw # cat /sys/module/nfs/parameters/nfs4_disable_idmapping Y # umount /mnt # mount rhel6u6-node2:/exports/nfs4 /mnt # grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2| caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ Example of nfs4_disable_idmapping = 'N' Raw [root at rhel6u3-node1 ~]# echo N > /sys/module/nfs/parameters/nfs4_disable_idmapping [root at rhel6u3-node1 ~]# umount /mnt [root at rhel6u3-node1 ~]# mount rhel6u6-node2:/exports/nfs4 /mnt [root at rhel6u3-node1 ~]# grep -A 5 rhel6u6-node2 /proc/self/mountstats | egrep '(rhel6u6-node2|caps:)' device rhel6u6-node2:/exports/nfs4 mounted on /mnt with fstype nfs4 statvers=1.0 caps: caps=0x7fff,wtmult=512,dtsize=32768,bsize=0,namlen=255 ^ NOTE: To force ONLY numeric IDs to be used on the client, add RPCIDMAPDARGS="-C" to the etc/sysconfig/nfs file and restart the rpcidmapd service. See man rpc.idmapd for more information. NOTE: This option can only be used with AUTH_SYS/UNIX authentication flavors, if you wish to use something like Kerberos, idmapping must be used. Root Cause NFSv4 utilizes ID mapping to ensure permissions are set properly on exported shares, if the domains of the client and server do not match then the permissions are mapped to nobody:nobody. NFS_CAP_UIDGID_NOMAP bit The nfs4_disable_idmapping is a module parameter which is read only one time, at the point at which the kernel sets up the data structure that represents an NFS server. Once it is read, a flag is set in the nfs_server structure NFS_CAP_UIDGID_NOMAP. Raw #define NFS_CAP_UIDGID_NOMAP (1U << 15) static int nfs4_init_server(struct nfs_server *server, const struct nfs_parsed_mount_data *data) { struct rpc_timeout timeparms; int error; dprintk("--> nfs4_init_server()\n"); nfs_init_timeout_values(&timeparms, data->nfs_server.protocol, data->timeo, data->retrans); /* Initialise the client representation from the mount data */ server->flags = data->flags; server->caps |= NFS_CAP_ATOMIC_OPEN|NFS_CAP_CHANGE_ATTR| NFS_CAP_POSIX_LOCK; if (!(data->flags & NFS_MOUNT_NORDIRPLUS)) server->caps |= NFS_CAP_READDIRPLUS; server->options = data->options; /* Get a client record */ error = nfs4_set_client(server, data->nfs_server.hostname, (const struct sockaddr *)&data->nfs_server.address, data->nfs_server.addrlen, data->client_address, data->auth_flavors[0], data->nfs_server.protocol, &timeparms, data->minorversion); if (error < 0) goto error; /* * Don't use NFS uid/gid mapping if we're using AUTH_SYS or lower * authentication. */ if (nfs4_disable_idmapping && data->auth_flavors[0] == RPC_AUTH_UNIX) <--- set a flag based on the module parameter server->caps |= NFS_CAP_UIDGID_NOMAP; <-------------------------- flag set if (data->rsize) server->rsize = nfs_block_size(data->rsize, NULL); if (data->wsize) server->wsize = nfs_block_size(data->wsize, NULL); server->acregmin = data->acregmin * HZ; server->acregmax = data->acregmax * HZ; server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; server->port = data->nfs_server.port; error = nfs_init_server_rpcclient(server, &timeparms, data-> auth_flavors[0]); error: /* Done */ dprintk("<-- nfs4_init_server() = %d\n", error); return error; } This flag is later checked when deciding whether to use numeric uid or gids or to use idmapping. Raw int nfs_map_uid_to_name(const struct nfs_server *server, __u32 uid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(uid, "user", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_user_hash, uid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(uid, buf, buflen); return ret; } int nfs_map_gid_to_group(const struct nfs_server *server, __u32 gid, char *buf, size_t buflen) { struct idmap *idmap = server->nfs_client->cl_idmap; int ret = -EINVAL; if (!(server->caps & NFS_CAP_UIDGID_NOMAP)) { <------------ CHECK FLAG, DECIDE whether to call idmapper ret = nfs_idmap_lookup_name(gid, "group", buf, buflen); if (ret < 0) ret = nfs_idmap_name(idmap, &idmap-> idmap_group_hash, gid, buf); } if (ret < 0) ret = nfs_map_numeric_to_string(gid, buf, buflen); return ret; } "fs/nfs/idmap.c" 872L, 21804C For more information on NFSv4 ID mapping in Red Hat Enterprise Linux, see https://access.redhat.com/articles/2252881 Diagnostic Steps Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs: Raw RPCIDMAPDARGS="-vvv" The following output is shown in /var/log/messages when the mount has been completed and the system shows nobody:nobody as user and group permissions on directories and files: Raw Jun 3 20:22:08 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Jun 3 20:25:44 node1 rpc.idmapd[1874]: nss_getpwnam: name 'root at example.com' does not map into domain 'localdomain' Collect a tcpdump of the mount attempt: Raw # tcpdump -s0 -i {INTERFACE} host {NFS.SERVER.IP} -w /tmp/{casenumber}-$ (hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap & If a TCP packet capture has been obtained, check for a nfs.nfsstat4 packet that has returned a non-zero response equivalent to 10039 (NFSV4ERR_BADOWNER). From the NFSv4 RFC: Raw NFS4ERR_BADOWNER = 10039,/* owner translation bad */ NFS4ERR_BADOWNER An owner, owner_group, or ACL attribute value can not be translated to local representation. Hope this helps. Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: 18 June 2018 16:54 To: gpfsug main discussion list Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Can anyone tell me why I?m not seeing the correct UID/GID resolution from CES? Configured with LDAP authentication, and this appears to work correctly. On my CES cluster (V4): [robert_oesterlin at unv-oester robert_oesterlin]$ ls -l total 3 -rw-r--r-- 1 nobody nobody 15 Jun 18 11:40 junk1 -rw-r--r-- 1 nobody nobody 4 Oct 9 2016 junk.file -rw-r--r-- 1 nobody nobody 1 May 24 10:44 test1 On my CNFS cluster (V3) [root at unv-oester2 robert_oesterlin]# ls -l total 0 -rw-r--r-- 1 robert_oesterlin users 15 Jun 18 11:40 junk1 -rw-r--r-- 1 robert_oesterlin users 4 Oct 9 2016 junk.file -rw-r--r-- 1 robert_oesterlin users 1 May 24 10:44 test1 Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Jun 18 17:56:55 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jun 2018 16:56:55 +0000 Subject: [gpfsug-discuss] CES-NFS: UID and GID resolution Message-ID: <8B8EB415-1221-454B-A08C-5B029C4F8BF8@nuance.com> That was it, thanks! Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Chetan R Kulkarni Reply-To: gpfsug main discussion list Date: Monday, June 18, 2018 at 11:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] CES-NFS: UID and GID resolution Please make sure NFSv4 ID Mapping value matches on client and server (e.g. test.com; may vary on your setup). server: mmnfs config change IDMAPD_DOMAIN=test.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jun 18 23:21:30 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 18 Jun 2018 15:21:30 -0700 Subject: [gpfsug-discuss] Save the Date September 19-20 2018 GPFS/SS Users Group Meeting at ORNL Message-ID: <5670B56F-AF19-4A90-8BDF-24B865231EC1@lbl.gov> Hello all, There is an event being planned for the week of September 16, 2018 at Oak Ridge National Laboratory (ORNL). This GPFS/Spectrum Scale UG meeting will be in conjunction with the HPCXXL User Group. We have done events like this in the past, typically in NYC, however, with the announcement of Summit (https://www.ornl.gov/news/ornl-launches-summit-supercomputer ) and it?s 250 PB, 2.5 TB/s GPFS installaion it is an exciting time to have ORNL as the venue. Per usual, the GPFS day will be free, however, this time the event will be split across two days, Wednesday (19th) afternoon and Thursday (20th) morning This way, if you want to travel out Wednesday morning and back Thursday afternoon it?s very do-able. If you want to stay around Thursday afternoon there will be a data center tour available. There will be some additional approval processes to attend at ORNL and we?ll share those details and more in the coming weeks. If you are interested in presenting something your site is working on, please let us know. User talks are always well received. Save a space on your calendar and hope to see you there. Best, Kristy PS - We will, as usual, also have an event at SC18, more on that soon as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 15:08:09 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 14:08:09 +0000 Subject: [gpfsug-discuss] mmbackup issue Message-ID: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Hallo All, we are working since two weeks(or more) on a PMR that mmbackup has problems with the MC Class in TSM. The result is that we have defined a version exist of 5. But with each run, the policy engine generate a expire list (where the mentioned files already selected) and at the end we see only (in every case) 2 Backup versions of a file. We are at: GPFS 5.0.1.1 TSM-Server 8.1.1.0 TSM-Client 7.1.6.2 After some testing we found the reason: Our mmbackup Test is performed with vi , to change a files content and restart the next mmbackup testcycle. The Problem that we found here with the defaults in vi (set backupcopy=no, attention if no a backupcopy are generatetd) There are after each test (change of the content) the file became every time a new inode number. This behavior is the reason why the shadowfile think(or the policyengine) the old file is never existent And generate an delete request in the expire policy files for dsmc (correct me if I wrong here) . Ok vi is not the problem but we had also Applications that had the same dataset handling (as ex. SAS) At SAS the data file will updated with a xx.data.new file and after the close the xx.data.new will be renamed to the original name xx.data again. And the miss interpretation of different inodes happen again. The question now are there code in the mmbackup or in gpfs for the shadow file to check or ignore the inode change for the same file. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.holliday at crick.ac.uk Wed Jun 20 15:19:13 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 20 Jun 2018 14:19:13 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Jun 20 15:45:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 20 Jun 2018 10:45:23 -0400 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> Message-ID: <9471.1529505923@turing-police.cc.vt.edu> On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From anobre at br.ibm.com Wed Jun 20 16:11:03 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Wed, 20 Jun 2018 15:11:03 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Jun 20 15:52:09 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 20 Jun 2018 10:52:09 -0400 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: <638e2070-2e99-e6dc-b843-1fd368c21bc0@nasa.gov> We've used the Windows client here @ NASA (I think we have in the neighborhood of between 15 and 20 clients). I'm guessing when you say GPFS shows no errors you've dumped waiters and grabbed dump tscomm output and that's clean? -Aaron On 6/20/18 10:19 AM, Michael Holliday wrote: > Hi All, > > We?ve being trying to get the windows system to mount GPFS.? We?ve set > the drive letter on the files system, and we can get the system added to > the GPFS cluster and showing as active. > > When we try to mount the file system ?the system just sits and does > nothing ? GPFS shows no errors or issues, there are no problems in the > log files. The firewalls are stopped and as far as we can tell it should > work. > > Does anyone have any experience with the GPFS windows client that may > help us? > > Michael > > Michael Holliday RITTech MBCS > > Senior HPC & Research Data Systems Engineer | eMedLab Operations Team > > Scientific Computing | IT&S | The Francis Crick Institute > > 1, Midland Road| London | NW1 1AT| United Kingdom > > Tel: 0203 796 3167 > > The Francis Crick Institute Limited is a registered charity in England > and Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From YARD at il.ibm.com Wed Jun 20 16:30:37 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:30:37 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Wed Jun 20 16:35:57 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 20 Jun 2018 18:35:57 +0300 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 20 17:00:03 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 20 Jun 2018 16:00:03 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <9471.1529505923@turing-police.cc.vt.edu> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> Message-ID: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Hallo Valdis, first thanks for the explanation we understand that, but this problem generate only 2 Version at tsm server for the same file, in the same directory. This mean that mmbackup and the .shadow... has no possibility to have for the same file in the same directory more then 2 backup versions with tsm. The native ba-client manage this. (Here are there already different inode numbers existent.) But at TSM-Server side the file that are selected at 'ba incr' are merged to the right filespace and will be binded to the mcclass >2 Version exist. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von valdis.kletnieks at vt.edu Gesendet: Mittwoch, 20. Juni 2018 16:45 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmbackup issue On Wed, 20 Jun 2018 14:08:09 -0000, "Grunenberg, Renar" said: > There are after each test (change of the content) the file became every time > a new inode number. This behavior is the reason why the shadowfile think(or the > policyengine) the old file is never existent That's because as far as the system is concerned, this is a new file that happens to have the same name. > At SAS the data file will updated with a xx.data.new file and after the close > the xx.data.new will be renamed to the original name xx.data again. And the > miss interpretation of different inodes happen again. Note that all the interesting information about a file is contained in the inode (the size, the owner/group, the permissions, creation time, disk blocks allocated, and so on). The *name* of the file is pretty much the only thing about a file that isn't in the inode - and that's because it's not a unique value for the file (there can be more than one link to a file). The name(s) of the file are stored in the parent directory as inode/name pairs. So here's what happens. You have the original file xx.data. It has an inode number 9934 or whatever. In the parent directory, there's an entry "name xx.data -> inode 9934". SAS creates a new file xx.data.new with inode number 83425 or whatever. Different file - the creation time, blocks allocated on disk, etc are all different than the file described by inode 9934. The directory now has "name xx.data -> 9934" "name xx.data.new -> inode 83425". SAS then renames xx.data.new - and rename is defined as "change the name entry for this inode, removing any old mappings for the same name" . So... 0) 'rename xx.data.new xx.data'. 1) Find 'xx.data.new' in this directory. "xx.data.new -> 83425" . So we're working with that inode. 2) Check for occurrences of the new name. Aha. There's 'xxx.data -> 9934'. Remove it. (2a) This may or may not actually make the file go away, as there may be other links and/or open file references to it.) 3) The directory now only has '83425 xx.data.new -> 83425'. 4) We now change the name. The directory now has 'xx.data -> 83425'. And your backup program quite rightly concludes that this is a new file by a name that was previously used - because it *is* a new file. Created at a different time, different blocks on disk, and so on. The only time that writing a "new" file keeps the same inode number is if the program actually opens the old file for writing and overwrites the old contents. However, this isn't actually done by many programs (including vi and SAS, as you've noticed) because if writing out the file encounters an error, you now have lost the contents - the old version has been overwritten, and the new version isn't complete and correct. So many programs write to a truly new file and then rename, because if writing the new file fails, the old version is still available on disk.... From olaf.weiser at de.ibm.com Wed Jun 20 17:06:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Jun 2018 18:06:56 +0200 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de><9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 21 08:32:39 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 21 Jun 2018 08:32:39 +0100 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> Message-ID: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Thu Jun 21 10:18:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 21 Jun 2018 09:18:29 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: <41b590c74c314bf38111c8cc17fde764@SMXRF105.msg.hukrf.de> Hallo JAB, the main problem here is that the inode is changeing for the same file in the same directory. The mmbackup generate and execute at first the expirelist from the same file with the old inode number and afterward the selective backup for the same file with the new inode number. We want to test now to increase the version deleted Parameter here. In contrast to the ba incr in a local fs TSM make these steps in one and handle these issue. My hope now the mmbackup people can enhance these to generate a comparison list is the Filename in the selection list and the filename already in expirelist and check these first, skip these file from expire list, before the expire list will be executed. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Isom.Crawford at ibm.com Thu Jun 21 15:48:02 2018 From: Isom.Crawford at ibm.com (Isom Crawford) Date: Thu, 21 Jun 2018 09:48:02 -0500 Subject: [gpfsug-discuss] GPFS Windows Mount Message-ID: Hi Michael, It's been a while, but I've run into similar issues with Scale on Windows. One possible issue is the GPFS administrative account configuration using the following steps: ---- 1. Create a domain user with the logon name root. 2. Add user root to the Domain Admins group or to the local Administrators group on each Windows node. 3. In root Properties/Profile/Home/LocalPath, define a HOME directory such as C:\Users\root\home that does not include spaces in the path name and is not the same as the profile path. 4. Give root the right to log on as a service as described in ?Allowing the GPFS administrative account to run as a service.? Step 3 including consistent use of the HOME directory you define, is required for the Cygwin environment ---- I have botched step 3 before with the result being very similar to your experience. Carefule re-configuration of the cygwin root *home* directory fixed some of the problems. Hope this helps. Another tangle you may run into is disabling IPv6. I had to completely disable IPv6 on the Windows client by not only deselecting it on the network interface properties list, but also disabling it system-wide. The symptoms vary, but utilities like mmaddnode or mmchnode may fail due to invalid interface. Check the output of /usr/lpp/mmfs/bin/mmcmi host to be sure it's the host that Scale expects. (In my case, it returned ::1 until I completely disabled IPv6). My notes follow: This KB article tells us about a setting that affects what Windows prefers, emphasized in bold: In Registry Editor, locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6 \Parameters Double-click DisabledComponents to modify the DisabledComponents entry. Note: If the DisabledComponents entry is unavailable, you must create it. To do this, follow these steps: In the Edit menu, point to New, and then click DWORD (32-bit) Value. Type DisabledComponents, and then press ENTER. Double-click DisabledComponents. Type any one of the following values in the Value data: field to configure the IPv6 protocol to the desired state, and then click OK: Type 0 to enable all IPv6 components. (Windows default setting) Type 0xffffffff to disable all IPv6 components, except the IPv6 loopback interface. This value also configures Windows to prefer using Internet Protocol version 4 (IPv4) over IPv6 by modifying entries in the prefix policy table. For more information, see Source and Destination Address Selection. Type 0x20 to prefer IPv4 over IPv6 by modifying entries in the prefix policy table. Type 0x10 to disable IPv6 on all nontunnel interfaces (on both LAN and Point-to-Point Protocol [PPP] interfaces). Type 0x01 to disable IPv6 on all tunnel interfaces. These include Intra-Site Automatic Tunnel Addressing Protocol (ISATAP), 6to4, and Teredo. Type 0x11 to disable all IPv6 interfaces except for the IPv6 loopback interface. Restart the computer for this setting to take effect. Kind Regards, Isom L. Crawford Jr., PhD. NA SDI SME Team Software Defined Infrastructure 2700 Redwood Street Royse City, TX 75189 United States Phone: 214-707-4611 E-mail: isom.crawford at ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Jun 21 22:42:30 2018 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Thu, 21 Jun 2018 21:42:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password Message-ID: I have a test cluster I setup months ago and then did nothing with. Now I need it again but for the life of me I can't remember the admin password to the GUI. Is there an easy way to reset it under the covers? I would hate to uninstall everything and start over. I can certainly admin everything from the cli but I use it to show others some things from time to time and it doesn't make sense to do that always from the command line. Thoughts? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevindjo at us.ibm.com Fri Jun 22 03:26:55 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Fri, 22 Jun 2018 02:26:55 +0000 Subject: [gpfsug-discuss] Spectrum Scale GUI password In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:13:43 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:13:43 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jun 22 14:19:18 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 22 Jun 2018 13:19:18 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: Message-ID: <5C6312EE-A958-4CBF-9AAC-F342CE87DB70@vanderbilt.edu> Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin On Jun 22, 2018, at 8:13 AM, Oesterlin, Robert > wrote: Any idea why I can?t force the file system manager off this node? I turned off the manager on the node (mmchnode --client) and used mmchmgr to move the other file systems off, but I can?t move this one. There are 6 other good choices for file system managers. I?ve never seen this message before. [root at nrg1-gpfs01 ~]# mmchmgr dataeng The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C46935624ea7048a9471608d5d841feb5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636652700325626997&sdata=Az9GZeDDG76lDLi02NSKYXsXK9EHy%2FT3vLAtaMrnpew%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jun 22 14:28:02 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 22 Jun 2018 13:28:02 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node Message-ID: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Fri Jun 22 15:10:29 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Fri, 22 Jun 2018 10:10:29 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Yep. And nrg1-gpfs13 isn?t even a manager node anymore! > > > > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned > as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) > appointed as manager for dataeng. > > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng > nrg1-gpfs05.nrg1.us.grid.nuance.com > > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) > completed take over for dataeng. > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Friday, June 22, 2018 at 8:21 AM > *To: *gpfsug main discussion list > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] File system manager - won't > change to new node > > > > Hi Bob, > > > > Have you tried explicitly moving it to a specific manager node? That?s > what I always do ? I personally never let GPFS pick when I?m moving the > management functions for some reason. Thanks? > > > > Kevin > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jun 22 15:38:05 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 22 Jun 2018 14:38:05 +0000 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <78d4f2d963134e87af9b123891da2c47@jumptrading.com> Hi Bob, Also tracing waiters on the cluster can help you understand if there is something that is blocking this kind of operation. Beyond the command output, which is usually too terse to understand what is actually happening, do the logs on the nodes in the cluster give you any further details about the operation? Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jordan Robertson Sent: Friday, June 22, 2018 9:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] File system manager - won't change to new node Note: External Email ________________________________ Two thoughts: 1) Has your config data update fully propagated after the mmchnode? We've (rarely) seen some weird stuff happen when that process isn't complete yet, or if a node in question simply didn't get the update (try md5sum'ing the mmsdrfs file on nrg1-gpfs13 and compare to the cluster manager's md5sum, make sure the push process isn't still running, etc.). If you see discrepancies, you could try an mmsdrrestore to get that node back into spec. 2) If everything looks fine; what are the chances you could simply try restarting GPFS on nrg1-gpfs13? Might be particularly interesting to see what the cluster tries to do with the filesystem once that node is down. -Jordan On Fri, Jun 22, 2018 at 9:28 AM, Oesterlin, Robert > wrote: Yep. And nrg1-gpfs13 isn?t even a manager node anymore! [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Friday, June 22, 2018 at 8:21 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] File system manager - won't change to new node Hi Bob, Have you tried explicitly moving it to a specific manager node? That?s what I always do ? I personally never let GPFS pick when I?m moving the management functions for some reason. Thanks? Kevin _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Fri Jun 22 20:03:52 2018 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 22 Jun 2018 14:03:52 -0500 Subject: [gpfsug-discuss] mmfsadddisk command interrupted Message-ID: We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jun 22 23:11:45 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Fri, 22 Jun 2018 18:11:45 -0400 Subject: [gpfsug-discuss] File system manager - won't change to new node In-Reply-To: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> References: <61D43577-4C09-47D1-A077-FD53A750E0B9@nuance.com> Message-ID: <128279.1529705505@turing-police.cc.vt.edu> On Fri, 22 Jun 2018 13:28:02 -0000, "Oesterlin, Robert" said: > [root at nrg1-gpfs01 ~]# mmchmgr dataeng nrg1-gpfs05 > Sending migrate request to current manager node 10.30.43.136 (nrg1-gpfs13). > Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > > 2018-06-22_09:26:08.305-0400: [I] Command: mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:09.178-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) resigned as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [N] Node 10.30.43.136 (nrg1-gpfs13) appointed as manager for dataeng. > 2018-06-22_09:26:09.179-0400: [I] Command: successful mmchmgr /dev/dataeng nrg1-gpfs05.nrg1.us.grid.nuance.com > 2018-06-22_09:26:10.116-0400: [I] Node 10.30.43.136 (nrg1-gpfs13) completed take over for dataeng. That's an.... "interesting".. definition of "successful".... :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Mon Jun 25 16:56:31 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 25 Jun 2018 15:56:31 +0000 Subject: [gpfsug-discuss] mmbackup issue In-Reply-To: <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> References: <2a31eb7bac60472d8f24c0b3fa659520@SMXRF105.msg.hukrf.de> <9471.1529505923@turing-police.cc.vt.edu> <8c3dc284822b4daf80fe51de376a1979@SMXRF105.msg.hukrf.de> <3f3f0e9e-8912-b8bc-6bc4-6bb9a992c20e@strath.ac.uk> Message-ID: Hallo All, here the requirement for enhancement of mmbackup. http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=121687 Please vote. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Jonathan Buzzard Gesendet: Donnerstag, 21. Juni 2018 09:33 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] mmbackup issue On 20/06/18 17:00, Grunenberg, Renar wrote: > Hallo Valdis, first thanks for the explanation we understand that, > but this problem generate only 2 Version at tsm server for the same > file, in the same directory. This mean that mmbackup and the > .shadow... has no possibility to have for the same file in the same > directory more then 2 backup versions with tsm. The native ba-client > manage this. (Here are there already different inode numbers > existent.) But at TSM-Server side the file that are selected at 'ba > incr' are merged to the right filespace and will be binded to the > mcclass >2 Version exist. > I think what you are saying is that mmbackup is only keeping two versions of the file in the backup, the current version and a single previous version. Normally in TSM you can control how many previous versions of the file that you can keep, for both active and inactive (aka deleted). You can also define how long these version are kept for. It sounds like you are saying that mmbackup is ignoring the policy that you have set for this in TSM (q copy) and doing it's own thing? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon Jun 25 20:43:49 2018 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 25 Jun 2018 15:43:49 -0400 Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... In-Reply-To: References: <20180418115445.8603670sy6ee6fk5@support.scinet.utoronto.ca> Message-ID: <20180625154349.47520gasb6cvevhx@support.scinet.utoronto.ca> It took a while before I could get back to this issue, but I want to confirm that Marc's suggestions worked line a charm, and did exactly what I hoped for: * remove any FOR FILESET(...) specifications * mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... I didn't have to do anything else, but exclude a few filesets from the scan. Thanks Jaime Quoting "Marc A Kaplan" : > I suggest you remove any FOR FILESET(...) specifications from your rules > and then run > > mmapplypolicy > /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan > ... --scope inodespace -P your-policy-rules-file ... > > See also the (RTFineM) for the --scope option and the Directory argument > of the mmapplypolicy command. > > That is the best, most efficient way to scan all the files that are in a > particular inode-space. Also, you must have all filesets of interest > "linked" and the file system must be mounted. > > Notice that "independent" means that the fileset name is used to denote > both a fileset and an inode-space, where said inode-space contains the > fileset of that name and possibly other "dependent" filesets... > > IF one wished to search the entire file system for files within several > different filesets, one could use rules with > > FOR FILESET('fileset1','fileset2','and-so-on') > > Or even more flexibly > > WHERE FILESET_NAME LIKE 'sql-like-pattern-with-%s-and-maybe-_s' > > Or even more powerfully > > WHERE regex(FILESET_NAME, 'extended-regular-.*-expression') > > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 04/18/2018 01:00 PM > Subject: [gpfsug-discuss] mmapplypolicy on nested filesets ... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > A few months ago I asked about limits and dynamics of traversing > depended .vs independent filesets on this forum. I used the > information provided to make decisions and setup our new DSS based > gpfs storage system. Now I have a problem I couldn't' yet figure out > how to make it work: > > 'project' and 'scratch' are top *independent* filesets of the same > file system. > > 'proj1', 'proj2' are dependent filesets nested under 'project' > 'scra1', 'scra2' are dependent filesets nested under 'scratch' > > I would like to run a purging policy on all contents under 'scratch' > (which includes 'scra1', 'scra2'), and TSM backup policies on all > contents under 'project' (which includes 'proj1', 'proj2'). > > HOWEVER: > When I run the purging policy on the whole gpfs device (with both > 'project' and 'scratch' filesets) > > * if I use FOR FILESET('scratch') on the list rules, the 'scra1' and > 'scra2' filesets under scratch are excluded (totally unexpected) > > * if I use FOR FILESET('scra1') I get error that scra1 is dependent > fileset (Ok, that is expected) > > * if I use /*FOR FILESET('scratch')*/, all contents under 'project', > 'proj1', 'proj2' are traversed as well, and I don't want that (it > takes too much time) > > * if I use /*FOR FILESET('scratch')*/, and instead of the whole device > I apply the policy to the /scratch mount point only, the policy still > traverses all the content of 'project', 'proj1', 'proj2', which I > don't want. (again, totally unexpected) > > QUESTION: > > How can I craft the syntax of the mmapplypolicy in combination with > the RULE filters, so that I can traverse all the contents under the > 'scratch' independent fileset, including the nested dependent filesets > 'scra1','scra2', and NOT traverse the other independent filesets at > all (since this takes too much time)? > > Thanks > Jaime > > > PS: FOR FILESET('scra*') does not work. > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=IpwHlr0YNr7rgV7gI8Y2sxIELLIwA15KK4nBnv9BYWk&e= > > ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE&s=aff0vMJkKd-Z3pw3-jckmI3ejqXh8aSr8rxkKf3OGdk&e= > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From erich at uw.edu Tue Jun 26 00:20:35 2018 From: erich at uw.edu (Eric Horst) Date: Mon, 25 Jun 2018 16:20:35 -0700 Subject: [gpfsug-discuss] mmchconfig subnets Message-ID: Hi, I'm hoping somebody has insights into how the subnets option actually works. I've read the docs a dozen times and I want to make sure I understand before I take my production cluster down to make the changes. On the current cluster the daemon addresses are on a gpfs private network and the admin addresses are on a public network. I'm changing so both daemon and admin are public and the subnets option is used to utilize the private network. This is to facilitate remote mounts to an independent cluster. The confusing factor in my case, not covered in the docs, is that the gpfs private network is subnetted and static routes are used to reach them. That is, there are three private networks, one for each datacenter and the cluster nodes daemon interfaces are spread between the three. 172.16.141.32/27 172.16.141.24/29 172.16.141.128/27 A router connects these three networks but are otherwise 100% private. For my mmchconfig subnets command should I use this? mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" Where I get confused is that I'm trying to reason through how Spectrum Scale is utilizing the subnets setting to decide if this will have the desired result on my cluster. If I change the node addresses to their public addresses, ie the private addresses are not explicitly configured in Scale, then how are the private addresses discovered? Does each node use the subnets option to identify that it has a private address and then dynamically shares that with the cluster? Thanks in advance for your clarifying comments. -Eric -- Eric Horst University of Washington -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 01:58:53 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Mon, 25 Jun 2018 18:58:53 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 Message-ID: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem?? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR From knop at us.ibm.com Tue Jun 26 04:36:43 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 25 Jun 2018 23:36:43 -0400 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joey, The subblocks-per-full-block value cannot be specified when the file system is created, but is rather computed automatically by GPFS. In file systems with format older than 5.0, the value is fixed at 32. For file systems with format 5.0.0 or later, the value is computed based on the block size. See manpage for mmcrfs, in table where the -B BlockSize option is explained. (Table 1. Block sizes and subblock sizes) . Say, for the default (in 5.0+) 4MB block size, the subblock size is 8KB. The minimum "practical" subblock size is 4KB, to keep 4KB-alignment to accommodate 4KN devices. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Joseph Mendoza To: gpfsug main discussion list Date: 06/25/2018 08:59 PM Subject: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small).? This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0?? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag??????????????? value??????????????????? description ------------------- ------------------------ ----------------------------------- ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) size in bytes (system pool) ??????????????????? 131072?????????????????? Minimum fragment (subblock) size in bytes (other pools) ?-i???????????????? 4096???????????????????? Inode size in bytes ?-I???????????????? 32768??????????????????? Indirect block size in bytes ?-B???????????????? 524288?????????????????? Block size (system pool) ??????????????????? 8388608????????????????? Block size (other pools) ?-V???????????????? 19.01 (5.0.1.0)????????? File system version ?--subblocks-per-full-block 64?????????????? Number of subblocks per full block ?-P???????????????? system;DATA????????????? Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Tue Jun 26 07:21:26 2018 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 26 Jun 2018 08:21:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Tue Jun 26 16:18:01 2018 From: jam at ucar.edu (Joseph Mendoza) Date: Tue, 26 Jun 2018 09:18:01 -0600 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: Message-ID: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block > size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb.? > is this setup for a traditional NSD Setup or for GNR as the recommendations would be different.? > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza > wrote: > > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem?? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small).? This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0?? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag??????????????? value??????????????????? description > ------------------- ------------------------ > ----------------------------------- > ?-f???????????????? 8192???????????????????? Minimum fragment (subblock) > size in bytes (system pool) > ??????????????????? 131072?????????????????? Minimum fragment (subblock) > size in bytes (other pools) > ?-i???????????????? 4096???????????????????? Inode size in bytes > ?-I???????????????? 32768??????????????????? Indirect block size in bytes > > ?-B???????????????? 524288?????????????????? Block size (system pool) > ??????????????????? 8388608????????????????? Block size (other pools) > > ?-V???????????????? 19.01 (5.0.1.0)????????? File system version > > ?--subblocks-per-full-block 64?????????????? Number of subblocks per > full block > ?-P???????????????? system;DATA????????????? Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Jun 26 16:32:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 26 Jun 2018 15:32:55 +0000 Subject: [gpfsug-discuss] mmchconfig subnets In-Reply-To: References: Message-ID: <20180626153255.d4sftfljwusa6yrg@utumno.gs.washington.edu> My understanding is that GPFS uses the network configuration on each node to determine netmask. The subnets option can be applied to specific nodes or groups of nodes with "mmchconfig subnets=... -N ", so what you're doing is specificy the preferred subnets for GPFS node communication, just for that list of nodes. For instance, we have four GPFS clusters, with three subnets: * eichler-cluster, eichler-cluster2 (10.130.0.0/16) * grc-cluster (10.200.0.0/16) * gs-cluster (10.110.0.0/16) And one data transfer system weasel that is a member of gs-cluster, but provides transfer services to all the clusters, and has an IP address on each subnet to avoid a bunch of network cross-talk. Its subnets setting looks like this: [weasel] subnets 10.130.0.0/eichler-cluster*.grid.gs.washington.edu 10.200.0.0/grc-cluster.grid.gs.washington.edu 10.110.0.0/gs-cluster.grid.gs.washington.edu Of course, there's some policy routing too to keep replies on the right interface as well, but that's the extent of the GPFS configuration. On Mon, Jun 25, 2018 at 04:20:35PM -0700, Eric Horst wrote: > Hi, I'm hoping somebody has insights into how the subnets option actually > works. I've read the docs a dozen times and I want to make sure I > understand before I take my production cluster down to make the changes. > > On the current cluster the daemon addresses are on a gpfs private network > and the admin addresses are on a public network. I'm changing so both > daemon and admin are public and the subnets option is used to utilize the > private network. This is to facilitate remote mounts to an independent > cluster. > > The confusing factor in my case, not covered in the docs, is that the gpfs > private network is subnetted and static routes are used to reach them. That > is, there are three private networks, one for each datacenter and the > cluster nodes daemon interfaces are spread between the three. > > 172.16.141.32/27 > 172.16.141.24/29 > 172.16.141.128/27 > > A router connects these three networks but are otherwise 100% private. > > For my mmchconfig subnets command should I use this? > > mmchconfig subnets="172.16.141.24 172.16.141.32 172.16.141.128" > > Where I get confused is that I'm trying to reason through how Spectrum > Scale is utilizing the subnets setting to decide if this will have the > desired result on my cluster. If I change the node addresses to their > public addresses, ie the private addresses are not explicitly configured in > Scale, then how are the private addresses discovered? Does each node use > the subnets option to identify that it has a private address and then > dynamically shares that with the cluster? > > Thanks in advance for your clarifying comments. > > -Eric > > -- > > Eric Horst > University of Washington > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From r.sobey at imperial.ac.uk Wed Jun 27 11:47:02 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 10:47:02 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed Message-ID: Hi all, I'm getting the following error in the GUI, running 5.0.1: "The following GUI refresh task(s) failed: PM_MONITOR". As yet, this is the only node I've upgraded to 5.0.1 - the rest are running (healthily, according to the GUI) 4.2.3.7. I'm not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I've completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jun 27 12:29:19 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 27 Jun 2018 11:29:19 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: Message-ID: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingo.altenburger at id.ethz.ch Wed Jun 27 12:45:29 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Wed, 27 Jun 2018 11:45:29 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Message-ID: Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 13:28:50 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 12:28:50 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> References: <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Wed Jun 27 13:49:38 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Wed, 27 Jun 2018 12:49:38 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jun 27 14:14:59 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Jun 2018 13:14:59 +0000 Subject: [gpfsug-discuss] PM_MONITOR refresh task failed In-Reply-To: References: , <3ecf3c07620940598cdbea444f8be157@SMXRF105.msg.hukrf.de> Message-ID: Hi Andreas, Output of the debug log ? no clue, but maybe you can interpret it better ? [root at icgpfsq1 ~]# /usr/lpp/mmfs/gui/cli/runtask pm_monitor --debug debug: locale=en_US debug: Raising event: gui_pmcollector_connection_ok, for node: localhost.localdomain err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more err: com.ibm.fscc.common.exceptions.FsccException: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:280) at com.ibm.fscc.common.tasks.ZiMONMonitorTask.run(ZiMONMonitorTask.java:144) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:221) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:193) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:369) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:65) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) Caused by: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:328) at com.ibm.fscc.db.object.health.HealthEventRaiser.raiseEvent(HealthEventRaiser.java:278) ... 9 more debug: Will not raise the following event using 'mmsysmonc' since it already exists in the database: reportingNode = 'icgpfsq1', eventName = 'gui_refresh_task_failed', entityId = '11', arguments = 'PM_MONITOR', identifier = 'null' err: com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was unsuccessful. at com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:69) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:426) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:412) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:93) EFSSG1150C Running specified task was unsuccessful. Thanks Richard From: Andreas Koeninger [mailto:andreas.koeninger at de.ibm.com] Sent: 27 June 2018 13:50 To: Sobey, Richard A Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hi Richard, if you double-click the event there should be some additional help available. The steps under "User Action" will hopefully help to identify the root cause: 1.) Check if there is additional information available by executing '/usr/lpp/mmfs/gui/cli/lstasklog [taskname]'. 2.) Run the specified task manually on the CLI by executing '/usr/lpp/mmfs/gui/cli/runtask [taskname] --debug'. ... Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Date: Wed, Jun 27, 2018 2:29 PM Hi Renar, No, it all runs over the same network. Thanks, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 27 June 2018 12:29 To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] PM_MONITOR refresh task failed Hallo Richard, do have a private admin-interface-lan in your cluster if yes than the logic of query the collector-node, and the representing ccr value are wrong. Can you ?mmperfmon query cpu?? If not then you hit a problem that I had yesterday. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Sobey, Richard A Gesendet: Mittwoch, 27. Juni 2018 12:47 An: 'gpfsug-discuss at spectrumscale.org' > Betreff: [gpfsug-discuss] PM_MONITOR refresh task failed Hi all, I?m getting the following error in the GUI, running 5.0.1: ?The following GUI refresh task(s) failed: PM_MONITOR?. As yet, this is the only node I?ve upgraded to 5.0.1 ? the rest are running (healthily, according to the GUI) 4.2.3.7. I?m not sure if this version mismatch is relevant to reporting this particular error. All the usual steps of restarting gpfsgui / pmcollector / pmsensors have been done. Will the error go away when I?ve completed the cluster upgrade, or is there some other foul play at work here? Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jun 27 18:53:39 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 27 Jun 2018 17:53:39 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Jun 27 19:09:40 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:09:40 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb From S.J.Thompson at bham.ac.uk Wed Jun 27 19:33:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 27 Jun 2018 18:33:28 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] Sent: 27 June 2018 19:09 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cabrillo at ifca.unican.es Wed Jun 27 20:24:28 2018 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Wed, 27 Jun 2018 21:24:28 +0200 (CEST) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Message-ID: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> An HTML attachment was scrubbed... URL: From renata at SLAC.STANFORD.EDU Wed Jun 27 19:54:47 2018 From: renata at SLAC.STANFORD.EDU (Renata Maria Dart) Date: Wed, 27 Jun 2018 11:54:47 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > >https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From renata at slac.stanford.edu Wed Jun 27 20:30:33 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 27 Jun 2018 12:30:33 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> References: <5f72489d-10d6-45e5-ab72-8832c62fc3d4@email.android.com> Message-ID: Hi, any gpfs commands fail with: root at ocio-gpu01 ~]# mmlsmgr get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsmgr: Command failed. Examine previous error messages to determine cause. The two "working" nodes are arbitrating. Also, they are using ccr, so doesn't that mean the primary/secondary setup for a client cluster doesn't apply? Renata On Wed, 27 Jun 2018, Iban Cabrillo wrote: >Hi,? ? Have you check if there is any manager node available?? >#mmlsmgr > >If not could you try to asig a new cluster/gpfs_fs manager. > >Mmchmgr? ? gpfs_fs. Manager_node >Mmchmgr.? ?-c.? Cluster_manager_node > >Cheers.? > > From scale at us.ibm.com Wed Jun 27 22:14:23 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 27 Jun 2018 17:14:23 -0400 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Renata, You may want to reduce the set of quorum nodes. If your version supports the --force option, you can run mmchnode --noquorum -N --force It is a good idea to configure tiebreaker disks in a cluster that has only 2 quorum nodes. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Renata Maria Dart To: gpfsug-discuss at spectrumscale.org Date: 06/27/2018 02:21 PM Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the quorum nodes is no longer in service and the other was reinstalled with a newer OS, both without informing the gpfs admins. Gpfs is still "working" on the two remaining nodes, that is, they continue to have access to the gpfs data on the remote clusters. But, I can no longer get any gpfs commands to work. On one of the 2 nodes that are still serving data, root at ocio-gpu01 ~]# mmlscluster get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlscluster: Command failed. Examine previous error messages to determine cause. On the reinstalled node, this fails in the same way: [root at ocio-gpu02 ccr]# mmstartup get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. I have looked through the users group interchanges but didn't find anything that seems to fit this scenario. Is there a way to salvage this cluster? Can it be done without shutting gpfs down on the 2 nodes that continue to work? Thanks for any advice, Renata Dart SLAC National Accelerator Lb _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kevindjo at us.ibm.com Wed Jun 27 22:20:41 2018 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Wed, 27 Jun 2018 21:20:41 +0000 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From spectrumscale at kiranghag.com Thu Jun 28 04:14:30 2018 From: spectrumscale at kiranghag.com (KG) Date: Thu, 28 Jun 2018 08:44:30 +0530 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Can you also check the time differences between nodes? We had a situation recently where the server time mismatch caused failures. On Thu, Jun 28, 2018 at 2:50 AM, Kevin D Johnson wrote: > You can also try to convert to the old primary/secondary model to back it > away from the default CCR configuration. > > mmchcluster --ccr-disable -p servername > > Then, temporarily go with only one quorum node and add more once the > cluster comes back up. Once the cluster is back up and has at least two > quorum nodes, do a --ccr-enable with the mmchcluster command. > > Kevin D. Johnson > Spectrum Computing, Senior Managing Consultant > MBA, MAcc, MS Global Technology and Development > IBM Certified Technical Specialist Level 2 Expert > > [image: IBM Certified Technical Specialist Level 2 Expert] > > Certified Deployment Professional - Spectrum Scale > Certified Solution Advisor - Spectrum Computing > Certified Solution Architect - Spectrum Storage Solutions > > > 720.349.6199 - kevindjo at us.ibm.com > > "To think is to achieve." - Thomas J. Watson, Sr. > > > > > ----- Original message ----- > From: "IBM Spectrum Scale" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: renata at slac.stanford.edu, gpfsug main discussion list < > gpfsug-discuss at spectrumscale.org> > Cc: > Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Date: Wed, Jun 27, 2018 5:15 PM > > > Hi Renata, > > You may want to reduce the set of quorum nodes. If your version supports > the --force option, you can run > > mmchnode --noquorum -N --force > > It is a good idea to configure tiebreaker disks in a cluster that has only > 2 quorum nodes. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------ > ------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/ > forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > [image: Inactive hide details for Renata Maria Dart ---06/27/2018 02:21:52 > PM---Hi, we have a client cluster of 4 nodes with 3 quorum n]Renata Maria > Dart ---06/27/2018 02:21:52 PM---Hi, we have a client cluster of 4 nodes > with 3 quorum nodes. One of the quorum nodes is no longer i > > From: Renata Maria Dart > To: gpfsug-discuss at spectrumscale.org > Date: 06/27/2018 02:21 PM > Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the > quorum nodes is no longer in service and the other was reinstalled with > a newer OS, both without informing the gpfs admins. Gpfs is still > "working" on the two remaining nodes, that is, they continue to have access > to the gpfs data on the remote clusters. But, I can no longer get > any gpfs commands to work. On one of the 2 nodes that are still serving > data, > > root at ocio-gpu01 ~]# mmlscluster > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlscluster: Command failed. Examine previous error messages to determine > cause. > > > On the reinstalled node, this fails in the same way: > > [root at ocio-gpu02 ccr]# mmstartup > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine > cause. > > > I have looked through the users group interchanges but didn't find anything > that seems to fit this scenario. > > Is there a way to salvage this cluster? Can it be done without > shutting gpfs down on the 2 nodes that continue to work? > > Thanks for any advice, > > Renata Dart > SLAC National Accelerator Lb > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=0ABB082ADFE7DE038f9e8a93df938690918c0AB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From ingo.altenburger at id.ethz.ch Thu Jun 28 07:37:48 2018 From: ingo.altenburger at id.ethz.ch (Altenburger Ingo (ID SD)) Date: Thu, 28 Jun 2018 06:37:48 +0000 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments In-Reply-To: References: Message-ID: I have to note that we use the from-SONAS-imported snapshot scheduler as part of the gui to create (and keep/delete) the snapshots. When performing mmcrsnapshot @2018-06-27-14-01 -j then this snapshot is visible in MacOS Finder but not in Windows 'previous versions'. Thus, the issue might be related to the way the scheduler is creating snapshots. Since having hundreds of filesets we need snapshots for, doing the scheduling by ourselves is not trivial and a preferred option. Regards Ingo Altenburger From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Altenburger Ingo (ID SD) Sent: Mittwoch, 27. Juni 2018 13:45 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOS environments Hi all, our (Windows) users are familiared with the 'previous versions' self-recover feature. We honor this by creating regular snapshots with the default @GMT prefix (non- at -heading prefixes are not visible in 'previous versions'). Unfortunately, MacOS clients having the same share mounted via smb or cifs cannot benefit from such configured snapshots, i.e. they are not visible in Finder window. Any non- at -heading prefix is visible in Finder as long as hidden .snapshots directory can be seen. Using a Terminal command line is also not feasible for end user purposes. Since the two case seem to be mutually exclusive, has anybody found a solution other than creating two snapshots, one with and one without the @-heading prefix? Thanks for any hint, Ingo Altenburger -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 28 08:44:16 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 28 Jun 2018 09:44:16 +0200 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From alvise.dorigo at psi.ch Thu Jun 28 09:02:07 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 08:02:07 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Message-ID: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jun 28 09:15:46 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 28 Jun 2018 08:15:46 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 28 09:26:41 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 28 Jun 2018 09:26:41 +0100 Subject: [gpfsug-discuss] Snapshot handling in mixed Windows/MacOSenvironments In-Reply-To: References: Message-ID: <1530174401.26036.55.camel@strath.ac.uk> On Wed, 2018-06-27 at 17:53 +0000, Christof Schmitt wrote: > Hi, > ? > we currently support the SMB protocol method of quering snapshots, > which is used by the Windows "Previous versions" dialog. Mac clients > unfortunately do not implement these explicit queries. Browsing the > snapshot directories with the @GMT names through SMB currently is not > supported. > ? > Could you open a RFE to request snapshot browsing from Mac clients? > An official request would be helpful in prioritizing the development > and test work required to support this. > ? Surely the lack of previous versions in the Mac Finder is an issue for Apple to fix??As such an RFE with IBM is not going to help and good look getting Apple to lift a finger. Similar for the various Linux file manager programs, though in this case being open source at least IBM could contribute code to fix the issue. However it occurs to me that a solution might be to run the Windows Explorer under Wine on the above platforms. Obviously licensing issues may well make that problematic, but perhaps the ReactOS clone of Windows Explorer supports the "Previous versions" feature, and if not it could be expanded to do so. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From alvise.dorigo at psi.ch Thu Jun 28 10:39:35 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 28 Jun 2018 09:39:35 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526727D32@MBX114.d.ethz.ch> Hi Andrew, thanks for the naswer. No, the port #2 (on all the nodes) is not cabled. Regards, Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andrew Beattie [abeattie at au1.ibm.com] Sent: Thursday, June 28, 2018 10:15 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dancasali at us.ibm.com Thu Jun 28 21:14:51 2018 From: dancasali at us.ibm.com (Daniel De souza casali) Date: Thu, 28 Jun 2018 16:14:51 -0400 Subject: [gpfsug-discuss] Sending logs to Logstash Message-ID: Good Afternoon! Does anyone here in the community send mmfs.log to Logstash? If so what do you use? Thank you! Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Fri Jun 29 16:26:51 2018 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Fri, 29 Jun 2018 15:26:51 +0000 Subject: [gpfsug-discuss] Job vacancy - Senior Research Data Storage Technologist, UCL Message-ID: Dear all, University College London are looking to appoint a Senior Research Data Storage Technologist to join their Research Data Services Team in central London. The role will involve the design and deployment of storage technologies to support research, as well as providing guidance on service development and advising research projects. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is currently developing a new institutional data repository for long-term curation and preservation. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the creation and operation of these services. For further particulars and the application form, please visit https://www.interquestgroup.com/p/join-a-world-class-workforce-at-ucl The application process will be closing shortly: deadline is 1st July 2018. Kind regards Alastair -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services RITS, UCL -------------- next part -------------- An HTML attachment was scrubbed... URL: