From alandhae at gmx.de Thu Jun 1 10:35:37 2017 From: alandhae at gmx.de (=?ISO-8859-15?Q?Andreas_Landh=E4u=DFer?=) Date: Thu, 1 Jun 2017 11:35:37 +0200 (CEST) Subject: [gpfsug-discuss] gpfs filesystem heat reporting, howto setup Message-ID: Hello all out there, customer wants to receive periodical reports on the filesystem heat (relatively age) of files. We already switched on fileheat using mmchconfig. mmchconfig fileheatlosspercent=10,fileHeatPeriodMinutes=1440 for the reports, I think I need to know the file usage in a given time period. Are there any how-to for obtaining this information, examples for ILM policies to be used as a start? any help will be highly appreciated. Best regards Andreas -- Andreas Landh?u?er +49 151 12133027 (mobile) alandhae at gmx.de From olaf.weiser at de.ibm.com Thu Jun 1 12:15:57 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 1 Jun 2017 13:15:57 +0200 Subject: [gpfsug-discuss] gpfs filesystem heat reporting, howto setup In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jun 1 14:40:30 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 1 Jun 2017 09:40:30 -0400 Subject: [gpfsug-discuss] gpfs filesystem heat reporting, howto setup In-Reply-To: References:

Message-ID: To generate a list of files and their file heat values... define([TS],esyscmd(date +%Y-%m-%d-%H-%M | tr -d '\n')) RULE 'x1' EXTERNAL LIST 'heat.TS' EXEC '' RULE 'x2' LIST 'heat.TS' SHOW(FILE_HEAT) WEIGHT(FILE_HEAT) /* use a WHERE clause to select or exclude files */ mmapplypolicy /gpfs-path-of-interest -I defer -P policy-rules-shown-above -f /path-for-result ... other good options are -N nodes -g /shared-temp To do it periodically use crontab. I defined the TS macro so each time you run it, you get a different filename. WEIGHT clause will cause "hottest" files to be listed firstmost. Notice that FILE_HEAT "shows" as a floating point number. --marc From: "Olaf Weiser" To: Andreas Landh?u?er , gpfsug main discussion list Date: 06/01/2017 07:17 AM Subject: Re: [gpfsug-discuss] gpfs filesystem heat reporting, howto setup Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Andreas, one could use the WEIGHT statement ... a simple policy for e.g. rule ?repack? MIGRATE FROM POOL ?xxxxxx? TO POOL ?xxxx? WEIGHT(FILE_HEAT) and then the -I prepare to see, what would be done by policy.. or you use the LIST function .. or .. and so on .. From: Andreas Landh?u?er To: gpfsug-discuss at spectrumscale.org Date: 06/01/2017 11:36 AM Subject: [gpfsug-discuss] gpfs filesystem heat reporting, howto setup Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all out there, customer wants to receive periodical reports on the filesystem heat (relatively age) of files. We already switched on fileheat using mmchconfig. mmchconfig fileheatlosspercent=10,fileHeatPeriodMinutes=1440 for the reports, I think I need to know the file usage in a given time period. Are there any how-to for obtaining this information, examples for ILM policies to be used as a start? any help will be highly appreciated. Best regards Andreas -- Andreas Landh?u?er +49 151 12133027 (mobile) alandhae at gmx.de_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Fri Jun 2 04:10:44 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 2 Jun 2017 03:10:44 +0000 Subject: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - Space Management (GPFS HSM) Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Jun 2 10:28:36 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 02 Jun 2017 05:28:36 -0400 Subject: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - Space Management (GPFS HSM) In-Reply-To: References: Message-ID: <20170602052836.11563o7dj205wptw@support.scinet.utoronto.ca> We have that situation. Users don't need to login to NSD's What you need is to add at least one gpfs client to the cluster (or multi-cluster), mount the DMAPI enabled file system, and use that node as a gateway for end-users. They can access the contents on the mount point with their own underprivileged accounts. Whether or not on a schedule, the moment an application or linux command (such as cp, cat, vi, etc) accesses a stub, the file will be staged. Jaime Quoting "Andrew Beattie" : > Quick question, Does anyone have a Scale / GPFS environment (HPC) > where users need the ability to recall data sets after they have been > stubbed, but only System Administrators are permitted to log onto the > NSD servers for security purposes. And if so how do you provide the > ability for the users to schedule their data set recalls? Regards, > Andrew Beattie Software Defined Storage - IT Specialist Phone: > 614-2133-7927 E-mail: abeattie at au1.ibm.com[1] > > > Links: > ------ > [1] mailto:abeattie at au1.ibm.com > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From abeattie at au1.ibm.com Fri Jun 2 10:48:11 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 2 Jun 2017 09:48:11 +0000 Subject: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - SpaceManagement (GPFS HSM) In-Reply-To: <20170602052836.11563o7dj205wptw@support.scinet.utoronto.ca> References: <20170602052836.11563o7dj205wptw@support.scinet.utoronto.ca>, Message-ID: An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Jun 2 16:12:41 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 02 Jun 2017 11:12:41 -0400 Subject: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - SpaceManagement (GPFS HSM) In-Reply-To: References: <20170602052836.11563o7dj205wptw@support.scinet.utoronto.ca>,

Message-ID: <20170602111241.56882fx2qr2yz2ax@support.scinet.utoronto.ca> It has been a while since I used HSM with GPFS via TSM, but as far as I can remember, unprivileged users can run dsmmigrate and dsmrecall. Based on the instructions on the link, dsmrecall may now leverage the Recommended Access Order (RAO) available on enterprise drives, however root would have to be the one to invoke that feature. In that case we may have to develop a middleware/wrapper for dsmrecall that will run as root and act on behalf of the user when optimization is requested. Someone here more familiar with the latest version of TSM-HSM may be able to give us some hints on how people are doing this in practice. Jaime Quoting "Andrew Beattie" : > Thanks Jaime, How do you get around Optimised recalls? from what I > can see the optimised recall process needs a root level account to > retrieve a list of files > https://www.ibm.com/support/knowledgecenter/SSSR2R_7.1.1/com.ibm.itsm.hsmul.doc/c_recall_optimized_tape.html[1] > Regards, Andrew Beattie Software Defined Storage - IT Specialist > Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com[2] ----- > Original message ----- > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Andrew Beattie" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - > Space Management (GPFS HSM) > Date: Fri, Jun 2, 2017 7:28 PM > We have that situation. > Users don't need to login to NSD's > > What you need is to add at least one gpfs client to the cluster (or > multi-cluster), mount the DMAPI enabled file system, and use that > node > as a gateway for end-users. They can access the contents on the mount > > point with their own underprivileged accounts. > > Whether or not on a schedule, the moment an application or linux > command (such as cp, cat, vi, etc) accesses a stub, the file will be > > staged. > > Jaime > > Quoting "Andrew Beattie" : > >> Quick question, Does anyone have a Scale / GPFS environment (HPC) >> where users need the ability to recall data sets after they have > been >> stubbed, but only System Administrators are permitted to log onto > the >> NSD servers for security purposes. And if so how do you provide > the >> ability for the users to schedule their data set recalls? > Regards, >> Andrew Beattie Software Defined Storage - IT Specialist Phone: >> 614-2133-7927 E-mail: abeattie at au1.ibm.com[1] >> >> >> Links: >> ------ >> [1] mailto:abeattie at au1.ibm.com[3] >> > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials[4] > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From r.sobey at imperial.ac.uk Fri Jun 2 16:51:12 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 2 Jun 2017 15:51:12 +0000 Subject: [gpfsug-discuss] TSM/SP compatibility with GPFS Message-ID: Hi all, Where should I start looking for a compatibility matrix between TSM and GPFS? Specifically, we are currently running TSM 7.1.6-2 and GPFS 4.2.1-2 with the intent to upgrade to GPFS 4.2.3-latest in early July. I've spent 30 minutes looking over various documents and the best I can find is this: http://www-01.ibm.com/support/docview.wss?uid=swg21248771 ..which talks about TSM in a Space Management context and would suggest that we need to upgrade to Spectrum Protect i.e. 8.1 and that GPFS 4.2.2.x is the maximum supported version... Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jun 2 17:40:11 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 2 Jun 2017 12:40:11 -0400 Subject: [gpfsug-discuss] TSM/SP compatibility with GPFS In-Reply-To: References: Message-ID: Upgrading from GPFS 4.2.x to GPFS 4.2.y should not "break" TSM. If it does, someone goofed, that would be a bug. (My opinion) Think of it this way. TSM is an application that uses the OS and the FileSystem(s). TSM can't verify it will work with all future versions of OS and Filesystems, and the releases can't be in lock step. Having said that, 4.2.3 has been "out" for a while, so if there were a TSM incompatibility, someone would have likely hit it or will before July... Trust but verify... From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/02/2017 11:51 AM Subject: [gpfsug-discuss] TSM/SP compatibility with GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Where should I start looking for a compatibility matrix between TSM and GPFS? Specifically, we are currently running TSM 7.1.6-2 and GPFS 4.2.1-2 with the intent to upgrade to GPFS 4.2.3-latest in early July. I?ve spent 30 minutes looking over various documents and the best I can find is this: http://www-01.ibm.com/support/docview.wss?uid=swg21248771 ..which talks about TSM in a Space Management context and would suggest that we need to upgrade to Spectrum Protect i.e. 8.1 and that GPFS 4.2.2.x is the maximum supported version? Cheers Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 08:51:10 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 08:51:10 +0100 Subject: [gpfsug-discuss] NSD access routes Message-ID: Morning all, Just a quick one about NSD access and read only disks. Can you have 2 NSD servers, one with read/write access to a disk and one with just read only access to the same disk? I know you can write to a disk over the network via another NSD server but can you mount the disk in read only mode to increase the read performance? This is all virtual/cloud based. Is GPFS clever enough (or can it be configured) to know to read from the locally attached read only disk but write back via another NSD server over the GPFS network? Cheers, ----------------------------- Dave Goodbourn Head of Systems Milk Visual Effects Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Mon Jun 5 08:52:39 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 5 Jun 2017 07:52:39 +0000 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: Message-ID: Hi Have you look at LROC instead? Might fit in simpler way to what your are describing. -- Cheers > On 5 Jun 2017, at 10.51, Dave Goodbourn wrote: > > Morning all, > > Just a quick one about NSD access and read only disks. > > Can you have 2 NSD servers, one with read/write access to a disk and one with just read only access to the same disk? I know you can write to a disk over the network via another NSD server but can you mount the disk in read only mode to increase the read performance? This is all virtual/cloud based. > > Is GPFS clever enough (or can it be configured) to know to read from the locally attached read only disk but write back via another NSD server over the GPFS network? > > Cheers, > ----------------------------- > Dave Goodbourn > > Head of Systems > Milk Visual Effects > Tel: +44 (0)20 3697 8448 > Mob: +44 (0)7917 411 069 Ellei edell?? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 09:02:16 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 09:02:16 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: Message-ID: Yeah, that was my back up plan but would be more costly in the cloud. Read only is a limitation of most cloud providers not something that I "want". Just trying to move a network bottleneck. Cheers, ----------------------------- Dave Goodbourn Head of Systems Milk Visual Effects Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 > On 5 Jun 2017, at 08:52, Luis Bolinches wrote: > > Hi > > Have you look at LROC instead? Might fit in simpler way to what your are describing. > > -- > Cheers > >> On 5 Jun 2017, at 10.51, Dave Goodbourn wrote: >> >> Morning all, >> >> Just a quick one about NSD access and read only disks. >> >> Can you have 2 NSD servers, one with read/write access to a disk and one with just read only access to the same disk? I know you can write to a disk over the network via another NSD server but can you mount the disk in read only mode to increase the read performance? This is all virtual/cloud based. >> >> Is GPFS clever enough (or can it be configured) to know to read from the locally attached read only disk but write back via another NSD server over the GPFS network? >> >> Cheers, >> ----------------------------- >> Dave Goodbourn >> >> Head of Systems >> Milk Visual Effects >> Tel: +44 (0)20 3697 8448 >> Mob: +44 (0)7917 411 069 > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 13:19:47 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 13:19:47 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References:

Message-ID: OK scrap my first question, I can't do what I wanted to do anyway! I'm testing out the LROC idea. All seems to be working well, but, is there anyway to monitor what's cached? How full it might be? The performance etc?? I can see some stats in mmfsadm dump lroc but that's about it. Cheers, ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 08:52, Luis Bolinches wrote: > Hi > > Have you look at LROC instead? Might fit in simpler way to what your are > describing. > > -- > Cheers > > On 5 Jun 2017, at 10.51, Dave Goodbourn wrote: > > Morning all, > > Just a quick one about NSD access and read only disks. > > Can you have 2 NSD servers, one with read/write access to a disk and one > with just read only access to the same disk? I know you can write to a disk > over the network via another NSD server but can you mount the disk in read > only mode to increase the read performance? This is all virtual/cloud based. > > Is GPFS clever enough (or can it be configured) to know to read from the > locally attached read only disk but write back via another NSD server over > the GPFS network? > > Cheers, > ----------------------------- > Dave Goodbourn > > Head of Systems > Milk Visual Effects > Tel: +44 (0)20 3697 8448 <+44%2020%203697%208448> > Mob: +44 (0)7917 411 069 <+44%207917%20411069> > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 5 13:24:27 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 5 Jun 2017 12:24:27 +0000 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References:

Message-ID: mmdiag --lroc ? From: > on behalf of "dave at milk-vfx.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 5 June 2017 at 13:19 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NSD access routes OK scrap my first question, I can't do what I wanted to do anyway! I'm testing out the LROC idea. All seems to be working well, but, is there anyway to monitor what's cached? How full it might be? The performance etc?? I can see some stats in mmfsadm dump lroc but that's about it. Cheers, ---------------------------------------------------- Dave Goodbourn Head of Systems MILK VISUAL EFFECTS [http://www.milk-vfx.com/src/milk_email_logo.jpg] 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 On 5 June 2017 at 08:52, Luis Bolinches > wrote: Hi Have you look at LROC instead? Might fit in simpler way to what your are describing. -- Cheers On 5 Jun 2017, at 10.51, Dave Goodbourn > wrote: Morning all, Just a quick one about NSD access and read only disks. Can you have 2 NSD servers, one with read/write access to a disk and one with just read only access to the same disk? I know you can write to a disk over the network via another NSD server but can you mount the disk in read only mode to increase the read performance? This is all virtual/cloud based. Is GPFS clever enough (or can it be configured) to know to read from the locally attached read only disk but write back via another NSD server over the GPFS network? Cheers, ----------------------------- Dave Goodbourn Head of Systems Milk Visual Effects Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jun 5 13:48:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 5 Jun 2017 12:48:48 +0000 Subject: [gpfsug-discuss] NSD access routes Message-ID: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> Hi Dave I?ve done a large-scale (600 node) LROC deployment here - feel free to reach out if you have questions. mmdiag --lroc is about all there is but it does give you a pretty good idea how the cache is performing but you can?t tell which files are cached. Also, watch out that the LROC cached will steal pagepool memory (1% of the LROC cache size) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Dave Goodbourn Reply-To: gpfsug main discussion list Date: Monday, June 5, 2017 at 7:19 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD access routes I'm testing out the LROC idea. All seems to be working well, but, is there anyway to monitor what's cached? How full it might be? The performance etc?? I can see some stats in mmfsadm dump lroc but that's about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 14:10:21 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 14:10:21 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References:

Message-ID: Ah yep, thanks a lot. ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 13:24, Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > mmdiag --lroc > > ? > > > From: on behalf of " > dave at milk-vfx.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: Monday, 5 June 2017 at 13:19 > To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NSD access routes > > OK scrap my first question, I can't do what I wanted to do anyway! > > I'm testing out the LROC idea. All seems to be working well, but, is there > anyway to monitor what's cached? How full it might be? The performance etc?? > > I can see some stats in mmfsadm dump lroc but that's about it. > > Cheers, > ---------------------------------------------------- > *Dave Goodbourn* > Head of Systems > *MILK VISUAL EFFECTS* > > 5th floor, Threeways House, > 40-44 Clipstone Street London, W1W 5DW > Tel: *+44 (0)20 3697 8448* > Mob: *+44 (0)7917 411 069* > > On 5 June 2017 at 08:52, Luis Bolinches wrote: > >> Hi >> >> Have you look at LROC instead? Might fit in simpler way to what your are >> describing. >> >> -- >> Cheers >> >> On 5 Jun 2017, at 10.51, Dave Goodbourn wrote: >> >> Morning all, >> >> Just a quick one about NSD access and read only disks. >> >> Can you have 2 NSD servers, one with read/write access to a disk and one >> with just read only access to the same disk? I know you can write to a disk >> over the network via another NSD server but can you mount the disk in read >> only mode to increase the read performance? This is all virtual/cloud based. >> >> Is GPFS clever enough (or can it be configured) to know to read from the >> locally attached read only disk but write back via another NSD server over >> the GPFS network? >> >> Cheers, >> ----------------------------- >> Dave Goodbourn >> >> Head of Systems >> Milk Visual Effects >> Tel: +44 (0)20 3697 8448 <+44%2020%203697%208448> >> Mob: +44 (0)7917 411 069 <+44%207917%20411069> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 14:49:55 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 14:49:55 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> Message-ID: Thanks Bob, That pagepool comment has just answered my next question! But it doesn't seem to be working. Here's my mmdiag output: === mmdiag: lroc === LROC Device(s): '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 1151997 MB, currently in use: 0 MB Statistics from: Mon Jun 5 13:40:50 2017 Total objects stored 0 (0 MB) recalled 0 (0 MB) objects failed to store 0 failed to recall 0 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 0 (0 MB) Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Directory objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Data objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 agent inserts=0, reads=0 response times (usec): insert min/max/avg=0/0/0 read min/max/avg=0/0/0 ssd writeIOs=0, writePages=0 readIOs=0, readPages=0 response times (usec): write min/max/avg=0/0/0 read min/max/avg=0/0/0 I've restarted GPFS on that node just in case but that didn't seem to help. I have LROC on a node that DOESN'T have direct access to an NSD so will hopefully cache files that get requested over NFS. How often are these stats updated? The Statistics line doesn't seem to update when running the command again. Dave, ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 13:48, Oesterlin, Robert wrote: > Hi Dave > > > > I?ve done a large-scale (600 node) LROC deployment here - feel free to > reach out if you have questions. > > > > mmdiag --lroc is about all there is but it does give you a pretty good > idea how the cache is performing but you can?t tell which files are cached. > Also, watch out that the LROC cached will steal pagepool memory (1% of the > LROC cache size) > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > *From: * on behalf of Dave > Goodbourn > *Reply-To: *gpfsug main discussion list > *Date: *Monday, June 5, 2017 at 7:19 AM > *To: *gpfsug main discussion list > *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes > > > > I'm testing out the LROC idea. All seems to be working well, but, is there > anyway to monitor what's cached? How full it might be? The performance etc?? > > > > I can see some stats in mmfsadm dump lroc but that's about it. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 14:55:22 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 14:55:22 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> Message-ID: OK slightly ignore that last email. It's still not updating the output but I realise the Stats from line is when they started so probably won't update! :( Still nothing seems to being cached though. ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 14:49, Dave Goodbourn wrote: > Thanks Bob, > > That pagepool comment has just answered my next question! > > But it doesn't seem to be working. Here's my mmdiag output: > > === mmdiag: lroc === > LROC Device(s): '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' > status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 > Max capacity: 1151997 MB, currently in use: 0 MB > Statistics from: Mon Jun 5 13:40:50 2017 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) > > Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Inode objects failed to store 0 failed to recall 0 failed to query 0 > failed to inval 0 > > Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Directory objects failed to store 0 failed to recall 0 failed to > query 0 failed to inval 0 > > Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Data objects failed to store 0 failed to recall 0 failed to query 0 > failed to inval 0 > > agent inserts=0, reads=0 > response times (usec): > insert min/max/avg=0/0/0 > read min/max/avg=0/0/0 > > ssd writeIOs=0, writePages=0 > readIOs=0, readPages=0 > response times (usec): > write min/max/avg=0/0/0 > read min/max/avg=0/0/0 > > > I've restarted GPFS on that node just in case but that didn't seem to > help. I have LROC on a node that DOESN'T have direct access to an NSD so > will hopefully cache files that get requested over NFS. > > How often are these stats updated? The Statistics line doesn't seem to > update when running the command again. > > Dave, > ---------------------------------------------------- > *Dave Goodbourn* > Head of Systems > *MILK VISUAL EFFECTS* > > 5th floor, Threeways House, > 40-44 Clipstone Street London, W1W 5DW > Tel: *+44 (0)20 3697 8448* > Mob: *+44 (0)7917 411 069* > > On 5 June 2017 at 13:48, Oesterlin, Robert > wrote: > >> Hi Dave >> >> >> >> I?ve done a large-scale (600 node) LROC deployment here - feel free to >> reach out if you have questions. >> >> >> >> mmdiag --lroc is about all there is but it does give you a pretty good >> idea how the cache is performing but you can?t tell which files are cached. >> Also, watch out that the LROC cached will steal pagepool memory (1% of the >> LROC cache size) >> >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> >> >> >> >> >> >> *From: * on behalf of Dave >> Goodbourn >> *Reply-To: *gpfsug main discussion list > > >> *Date: *Monday, June 5, 2017 at 7:19 AM >> *To: *gpfsug main discussion list >> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes >> >> >> >> I'm testing out the LROC idea. All seems to be working well, but, is >> there anyway to monitor what's cached? How full it might be? The >> performance etc?? >> >> >> >> I can see some stats in mmfsadm dump lroc but that's about it. >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 5 14:59:07 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 5 Jun 2017 13:59:07 +0000 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> , Message-ID: We've seen exactly this behaviour. Removing and readding the lroc nsd device worked for us. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of dave at milk-vfx.com [dave at milk-vfx.com] Sent: 05 June 2017 14:55 To: Oesterlin, Robert Cc: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NSD access routes OK slightly ignore that last email. It's still not updating the output but I realise the Stats from line is when they started so probably won't update! :( Still nothing seems to being cached though. ---------------------------------------------------- Dave Goodbourn Head of Systems MILK VISUAL EFFECTS [http://www.milk-vfx.com/src/milk_email_logo.jpg] 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 On 5 June 2017 at 14:49, Dave Goodbourn > wrote: Thanks Bob, That pagepool comment has just answered my next question! But it doesn't seem to be working. Here's my mmdiag output: === mmdiag: lroc === LROC Device(s): '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 1151997 MB, currently in use: 0 MB Statistics from: Mon Jun 5 13:40:50 2017 Total objects stored 0 (0 MB) recalled 0 (0 MB) objects failed to store 0 failed to recall 0 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 0 (0 MB) Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Directory objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) Data objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 agent inserts=0, reads=0 response times (usec): insert min/max/avg=0/0/0 read min/max/avg=0/0/0 ssd writeIOs=0, writePages=0 readIOs=0, readPages=0 response times (usec): write min/max/avg=0/0/0 read min/max/avg=0/0/0 I've restarted GPFS on that node just in case but that didn't seem to help. I have LROC on a node that DOESN'T have direct access to an NSD so will hopefully cache files that get requested over NFS. How often are these stats updated? The Statistics line doesn't seem to update when running the command again. Dave, ---------------------------------------------------- Dave Goodbourn Head of Systems MILK VISUAL EFFECTS [http://www.milk-vfx.com/src/milk_email_logo.jpg] 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: +44 (0)20 3697 8448 Mob: +44 (0)7917 411 069 On 5 June 2017 at 13:48, Oesterlin, Robert > wrote: Hi Dave I?ve done a large-scale (600 node) LROC deployment here - feel free to reach out if you have questions. mmdiag --lroc is about all there is but it does give you a pretty good idea how the cache is performing but you can?t tell which files are cached. Also, watch out that the LROC cached will steal pagepool memory (1% of the LROC cache size) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Dave Goodbourn > Reply-To: gpfsug main discussion list > Date: Monday, June 5, 2017 at 7:19 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD access routes I'm testing out the LROC idea. All seems to be working well, but, is there anyway to monitor what's cached? How full it might be? The performance etc?? I can see some stats in mmfsadm dump lroc but that's about it. From oehmes at gmail.com Mon Jun 5 14:59:44 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 05 Jun 2017 13:59:44 +0000 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com> Message-ID: if you are using O_DIRECT calls they will be ignored by default for LROC, same for encrypted data. how exactly are you testing this? On Mon, Jun 5, 2017 at 6:50 AM Dave Goodbourn wrote: > Thanks Bob, > > That pagepool comment has just answered my next question! > > But it doesn't seem to be working. Here's my mmdiag output: > > === mmdiag: lroc === > LROC Device(s): > '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' > status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 > Max capacity: 1151997 MB, currently in use: 0 MB > Statistics from: Mon Jun 5 13:40:50 2017 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) > > Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Inode objects failed to store 0 failed to recall 0 failed to query 0 > failed to inval 0 > > Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Directory objects failed to store 0 failed to recall 0 failed to > query 0 failed to inval 0 > > Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % > Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) > Data objects failed to store 0 failed to recall 0 failed to query 0 > failed to inval 0 > > agent inserts=0, reads=0 > response times (usec): > insert min/max/avg=0/0/0 > read min/max/avg=0/0/0 > > ssd writeIOs=0, writePages=0 > readIOs=0, readPages=0 > response times (usec): > write min/max/avg=0/0/0 > read min/max/avg=0/0/0 > > > I've restarted GPFS on that node just in case but that didn't seem to > help. I have LROC on a node that DOESN'T have direct access to an NSD so > will hopefully cache files that get requested over NFS. > > How often are these stats updated? The Statistics line doesn't seem to > update when running the command again. > > Dave, > ---------------------------------------------------- > *Dave Goodbourn* > Head of Systems > *MILK VISUAL EFFECTS* > > 5th floor, Threeways House, > 40-44 Clipstone Street London, W1W 5DW > Tel: *+44 (0)20 3697 8448* > Mob: *+44 (0)7917 411 069* > > On 5 June 2017 at 13:48, Oesterlin, Robert > wrote: > >> Hi Dave >> >> >> >> I?ve done a large-scale (600 node) LROC deployment here - feel free to >> reach out if you have questions. >> >> >> >> mmdiag --lroc is about all there is but it does give you a pretty good >> idea how the cache is performing but you can?t tell which files are cached. >> Also, watch out that the LROC cached will steal pagepool memory (1% of the >> LROC cache size) >> >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> >> >> >> >> >> >> *From: * on behalf of Dave >> Goodbourn >> *Reply-To: *gpfsug main discussion list > > >> *Date: *Monday, June 5, 2017 at 7:19 AM >> *To: *gpfsug main discussion list >> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes >> >> >> >> I'm testing out the LROC idea. All seems to be working well, but, is >> there anyway to monitor what's cached? How full it might be? The >> performance etc?? >> >> >> >> I can see some stats in mmfsadm dump lroc but that's about it. >> >> >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 15:00:45 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 15:00:45 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com>

Message-ID: OK I'm going to hang my head in the corner...RTFM...I've not filled the memory buffer pool yet so I doubt it will have anything in it yet!! :( ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 14:55, Dave Goodbourn wrote: > OK slightly ignore that last email. It's still not updating the output but > I realise the Stats from line is when they started so probably won't > update! :( > > Still nothing seems to being cached though. > > ---------------------------------------------------- > *Dave Goodbourn* > Head of Systems > *MILK VISUAL EFFECTS* > > 5th floor, Threeways House, > 40-44 Clipstone Street London, W1W 5DW > Tel: *+44 (0)20 3697 8448* > Mob: *+44 (0)7917 411 069* > > On 5 June 2017 at 14:49, Dave Goodbourn wrote: > >> Thanks Bob, >> >> That pagepool comment has just answered my next question! >> >> But it doesn't seem to be working. Here's my mmdiag output: >> >> === mmdiag: lroc === >> LROC Device(s): '0AF0000259355BA8#/dev/sdb;0AF >> 0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 >> Max capacity: 1151997 MB, currently in use: 0 MB >> Statistics from: Mon Jun 5 13:40:50 2017 >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) >> >> Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >> Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >> Inode objects failed to store 0 failed to recall 0 failed to query >> 0 failed to inval 0 >> >> Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >> Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >> Directory objects failed to store 0 failed to recall 0 failed to >> query 0 failed to inval 0 >> >> Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >> Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >> Data objects failed to store 0 failed to recall 0 failed to query 0 >> failed to inval 0 >> >> agent inserts=0, reads=0 >> response times (usec): >> insert min/max/avg=0/0/0 >> read min/max/avg=0/0/0 >> >> ssd writeIOs=0, writePages=0 >> readIOs=0, readPages=0 >> response times (usec): >> write min/max/avg=0/0/0 >> read min/max/avg=0/0/0 >> >> >> I've restarted GPFS on that node just in case but that didn't seem to >> help. I have LROC on a node that DOESN'T have direct access to an NSD so >> will hopefully cache files that get requested over NFS. >> >> How often are these stats updated? The Statistics line doesn't seem to >> update when running the command again. >> >> Dave, >> ---------------------------------------------------- >> *Dave Goodbourn* >> Head of Systems >> *MILK VISUAL EFFECTS* >> >> 5th floor, Threeways House, >> 40-44 Clipstone Street London, W1W 5DW >> Tel: *+44 (0)20 3697 8448* >> Mob: *+44 (0)7917 411 069* >> >> On 5 June 2017 at 13:48, Oesterlin, Robert >> wrote: >> >>> Hi Dave >>> >>> >>> >>> I?ve done a large-scale (600 node) LROC deployment here - feel free to >>> reach out if you have questions. >>> >>> >>> >>> mmdiag --lroc is about all there is but it does give you a pretty good >>> idea how the cache is performing but you can?t tell which files are cached. >>> Also, watch out that the LROC cached will steal pagepool memory (1% of the >>> LROC cache size) >>> >>> >>> >>> Bob Oesterlin >>> Sr Principal Storage Engineer, Nuance >>> >>> >>> >>> >>> >>> >>> >>> *From: * on behalf of Dave >>> Goodbourn >>> *Reply-To: *gpfsug main discussion list >> org> >>> *Date: *Monday, June 5, 2017 at 7:19 AM >>> *To: *gpfsug main discussion list >>> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes >>> >>> >>> >>> I'm testing out the LROC idea. All seems to be working well, but, is >>> there anyway to monitor what's cached? How full it might be? The >>> performance etc?? >>> >>> >>> >>> I can see some stats in mmfsadm dump lroc but that's about it. >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jun 5 15:03:28 2017 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 05 Jun 2017 14:03:28 +0000 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com>

Message-ID: yes as long as you haven't pushed anything to it (means pagepool got under enough pressure to free up space) you won't see anything in the stats :-) sven On Mon, Jun 5, 2017 at 7:00 AM Dave Goodbourn wrote: > OK I'm going to hang my head in the corner...RTFM...I've not filled the > memory buffer pool yet so I doubt it will have anything in it yet!! :( > > ---------------------------------------------------- > *Dave Goodbourn* > Head of Systems > *MILK VISUAL EFFECTS* > > 5th floor, Threeways House, > 40-44 Clipstone Street London, W1W 5DW > Tel: *+44 (0)20 3697 8448* > Mob: *+44 (0)7917 411 069* > > On 5 June 2017 at 14:55, Dave Goodbourn wrote: > >> OK slightly ignore that last email. It's still not updating the output >> but I realise the Stats from line is when they started so probably won't >> update! :( >> >> Still nothing seems to being cached though. >> >> ---------------------------------------------------- >> *Dave Goodbourn* >> Head of Systems >> *MILK VISUAL EFFECTS* >> >> 5th floor, Threeways House, >> 40-44 Clipstone Street London, W1W 5DW >> Tel: *+44 (0)20 3697 8448* >> Mob: *+44 (0)7917 411 069* >> >> On 5 June 2017 at 14:49, Dave Goodbourn wrote: >> >>> Thanks Bob, >>> >>> That pagepool comment has just answered my next question! >>> >>> But it doesn't seem to be working. Here's my mmdiag output: >>> >>> === mmdiag: lroc === >>> LROC Device(s): >>> '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' >>> status Running >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 >>> Max capacity: 1151997 MB, currently in use: 0 MB >>> Statistics from: Mon Jun 5 13:40:50 2017 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >>> >>> Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>> Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>> Inode objects failed to store 0 failed to recall 0 failed to query >>> 0 failed to inval 0 >>> >>> Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>> Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>> Directory objects failed to store 0 failed to recall 0 failed to >>> query 0 failed to inval 0 >>> >>> Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>> Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>> Data objects failed to store 0 failed to recall 0 failed to query >>> 0 failed to inval 0 >>> >>> agent inserts=0, reads=0 >>> response times (usec): >>> insert min/max/avg=0/0/0 >>> read min/max/avg=0/0/0 >>> >>> ssd writeIOs=0, writePages=0 >>> readIOs=0, readPages=0 >>> response times (usec): >>> write min/max/avg=0/0/0 >>> read min/max/avg=0/0/0 >>> >>> >>> I've restarted GPFS on that node just in case but that didn't seem to >>> help. I have LROC on a node that DOESN'T have direct access to an NSD so >>> will hopefully cache files that get requested over NFS. >>> >>> How often are these stats updated? The Statistics line doesn't seem to >>> update when running the command again. >>> >>> Dave, >>> ---------------------------------------------------- >>> *Dave Goodbourn* >>> Head of Systems >>> *MILK VISUAL EFFECTS* >>> >>> 5th floor, Threeways House, >>> 40-44 Clipstone Street London, W1W 5DW >>> Tel: *+44 (0)20 3697 8448* >>> Mob: *+44 (0)7917 411 069* >>> >>> On 5 June 2017 at 13:48, Oesterlin, Robert >>> wrote: >>> >>>> Hi Dave >>>> >>>> >>>> >>>> I?ve done a large-scale (600 node) LROC deployment here - feel free to >>>> reach out if you have questions. >>>> >>>> >>>> >>>> mmdiag --lroc is about all there is but it does give you a pretty good >>>> idea how the cache is performing but you can?t tell which files are cached. >>>> Also, watch out that the LROC cached will steal pagepool memory (1% of the >>>> LROC cache size) >>>> >>>> >>>> >>>> Bob Oesterlin >>>> Sr Principal Storage Engineer, Nuance >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *From: * on behalf of Dave >>>> Goodbourn >>>> *Reply-To: *gpfsug main discussion list < >>>> gpfsug-discuss at spectrumscale.org> >>>> *Date: *Monday, June 5, 2017 at 7:19 AM >>>> *To: *gpfsug main discussion list >>>> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes >>>> >>>> >>>> >>>> I'm testing out the LROC idea. All seems to be working well, but, is >>>> there anyway to monitor what's cached? How full it might be? The >>>> performance etc?? >>>> >>>> >>>> >>>> I can see some stats in mmfsadm dump lroc but that's about it. >>>> >>>> >>>> >>>> >>> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at milk-vfx.com Mon Jun 5 15:15:00 2017 From: dave at milk-vfx.com (Dave Goodbourn) Date: Mon, 5 Jun 2017 15:15:00 +0100 Subject: [gpfsug-discuss] NSD access routes In-Reply-To: References: <259A2E45-7B84-4637-A37E-B18C4271DDB2@nuance.com>

Message-ID: Ha! A quick shrink of the pagepool and we're in action! Thanks all. Dave. ---------------------------------------------------- *Dave Goodbourn* Head of Systems *MILK VISUAL EFFECTS* 5th floor, Threeways House, 40-44 Clipstone Street London, W1W 5DW Tel: *+44 (0)20 3697 8448* Mob: *+44 (0)7917 411 069* On 5 June 2017 at 15:03, Sven Oehme wrote: > yes as long as you haven't pushed anything to it (means pagepool got under > enough pressure to free up space) you won't see anything in the stats :-) > > sven > > > On Mon, Jun 5, 2017 at 7:00 AM Dave Goodbourn wrote: > >> OK I'm going to hang my head in the corner...RTFM...I've not filled the >> memory buffer pool yet so I doubt it will have anything in it yet!! :( >> >> ---------------------------------------------------- >> *Dave Goodbourn* >> Head of Systems >> *MILK VISUAL EFFECTS* >> >> 5th floor, Threeways House, >> 40-44 Clipstone Street London, W1W 5DW >> Tel: *+44 (0)20 3697 8448* >> Mob: *+44 (0)7917 411 069* >> >> On 5 June 2017 at 14:55, Dave Goodbourn wrote: >> >>> OK slightly ignore that last email. It's still not updating the output >>> but I realise the Stats from line is when they started so probably won't >>> update! :( >>> >>> Still nothing seems to being cached though. >>> >>> ---------------------------------------------------- >>> *Dave Goodbourn* >>> Head of Systems >>> *MILK VISUAL EFFECTS* >>> >>> 5th floor, Threeways House, >>> 40-44 Clipstone Street London, W1W 5DW >>> Tel: *+44 (0)20 3697 8448* >>> Mob: *+44 (0)7917 411 069* >>> >>> On 5 June 2017 at 14:49, Dave Goodbourn wrote: >>> >>>> Thanks Bob, >>>> >>>> That pagepool comment has just answered my next question! >>>> >>>> But it doesn't seem to be working. Here's my mmdiag output: >>>> >>>> === mmdiag: lroc === >>>> LROC Device(s): '0AF0000259355BA8#/dev/sdb;0AF0000259355BA9#/dev/sdc;0AF0000259355BAA#/dev/sdd;' >>>> status Running >>>> Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 >>>> Max capacity: 1151997 MB, currently in use: 0 MB >>>> Statistics from: Mon Jun 5 13:40:50 2017 >>>> >>>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>>> objects failed to store 0 failed to recall 0 failed to inval 0 >>>> objects queried 0 (0 MB) not found 0 = 0.00 % >>>> objects invalidated 0 (0 MB) >>>> >>>> Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>>> Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>>> Inode objects failed to store 0 failed to recall 0 failed to >>>> query 0 failed to inval 0 >>>> >>>> Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>>> Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>>> Directory objects failed to store 0 failed to recall 0 failed to >>>> query 0 failed to inval 0 >>>> >>>> Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 % >>>> Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB) >>>> Data objects failed to store 0 failed to recall 0 failed to query >>>> 0 failed to inval 0 >>>> >>>> agent inserts=0, reads=0 >>>> response times (usec): >>>> insert min/max/avg=0/0/0 >>>> read min/max/avg=0/0/0 >>>> >>>> ssd writeIOs=0, writePages=0 >>>> readIOs=0, readPages=0 >>>> response times (usec): >>>> write min/max/avg=0/0/0 >>>> read min/max/avg=0/0/0 >>>> >>>> >>>> I've restarted GPFS on that node just in case but that didn't seem to >>>> help. I have LROC on a node that DOESN'T have direct access to an NSD so >>>> will hopefully cache files that get requested over NFS. >>>> >>>> How often are these stats updated? The Statistics line doesn't seem to >>>> update when running the command again. >>>> >>>> Dave, >>>> ---------------------------------------------------- >>>> *Dave Goodbourn* >>>> Head of Systems >>>> *MILK VISUAL EFFECTS* >>>> >>>> 5th floor, Threeways House, >>>> 40-44 Clipstone Street London, W1W 5DW >>>> Tel: *+44 (0)20 3697 8448* >>>> Mob: *+44 (0)7917 411 069* >>>> >>>> On 5 June 2017 at 13:48, Oesterlin, Robert >>> > wrote: >>>> >>>>> Hi Dave >>>>> >>>>> >>>>> >>>>> I?ve done a large-scale (600 node) LROC deployment here - feel free to >>>>> reach out if you have questions. >>>>> >>>>> >>>>> >>>>> mmdiag --lroc is about all there is but it does give you a pretty good >>>>> idea how the cache is performing but you can?t tell which files are cached. >>>>> Also, watch out that the LROC cached will steal pagepool memory (1% of the >>>>> LROC cache size) >>>>> >>>>> >>>>> >>>>> Bob Oesterlin >>>>> Sr Principal Storage Engineer, Nuance >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From: * on behalf of Dave >>>>> Goodbourn >>>>> *Reply-To: *gpfsug main discussion list >>>> org> >>>>> *Date: *Monday, June 5, 2017 at 7:19 AM >>>>> *To: *gpfsug main discussion list >>>>> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] NSD access routes >>>>> >>>>> >>>>> >>>>> I'm testing out the LROC idea. All seems to be working well, but, is >>>>> there anyway to monitor what's cached? How full it might be? The >>>>> performance etc?? >>>>> >>>>> >>>>> >>>>> I can see some stats in mmfsadm dump lroc but that's about it. >>>>> >>>>> >>>>> >>>>> >>>> >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jun 5 16:54:09 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 5 Jun 2017 15:54:09 +0000 Subject: [gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add Message-ID: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> Our node build process re-adds a node to the cluster and then does a ?service gpfs start?, but GPFS doesn?t start. From the build log: + ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com' + rc=0 + chkconfig gpfs on + service gpfs start The ?service gpfs start? command hangs and never seems to return. If I look at the process tree: [root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs" 11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post 12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes no 12208 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 12271 ? S 0:00 /bin/sh /sbin/service gpfs start 12276 ? S 0:00 /bin/sh /etc/init.d/gpfs start 12278 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot 12292 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot 12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num 12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e s/\.num$// 21639 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 This is GPFS 4.2.2-1 This seems to occur only on the initial startup after build - if I try to start GPFS again, it works just fine - any ideas on what it?s sitting here waiting? Nothing in mmfslog (does not exist) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Jun 5 20:54:31 2017 From: ewahl at osc.edu (Edward Wahl) Date: Mon, 5 Jun 2017 15:54:31 -0400 Subject: [gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add In-Reply-To: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> References: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> Message-ID: <20170605155431.75b42322@osc.edu> Just a thought, as we noticed the EXACT opposite of this, and what I think is new behavior in either mmmount or mmfsfuncs.. Does the file system exist in your /etc/fstab (or AIX equiv) yet? Ed On Mon, 5 Jun 2017 15:54:09 +0000 "Oesterlin, Robert" wrote: > Our node build process re-adds a node to the cluster and then does a ?service > gpfs start?, but GPFS doesn?t start. From the build log: > > + ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com > '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com' > + rc=0 > + chkconfig gpfs on > + service gpfs start > > The ?service gpfs start? command hangs and never seems to return. > > If I look at the process tree: > > [root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs" > 11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post > 12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 > 10 /var/adm/ras/mmsdrserv.log 128 yes no 12208 ? S > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 12271 ? > S 0:00 /bin/sh /sbin/service gpfs start 12276 ? S > 0:00 /bin/sh /etc/init.d/gpfs start 12278 ? S > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot > 12292 ? S > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot > 12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num > 12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e > s/\.num$// 21639 ? S > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > > This is GPFS 4.2.2-1 > > This seems to occur only on the initial startup after build - if I try to > start GPFS again, it works just fine - any ideas on what it?s sitting here > waiting? Nothing in mmfslog (does not exist) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From ewahl at osc.edu Mon Jun 5 20:56:55 2017 From: ewahl at osc.edu (Edward Wahl) Date: Mon, 5 Jun 2017 15:56:55 -0400 Subject: [gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add In-Reply-To: <20170605155431.75b42322@osc.edu> References: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> <20170605155431.75b42322@osc.edu> Message-ID: <20170605155655.3ce54084@osc.edu> On Mon, 5 Jun 2017 15:54:31 -0400 Edward Wahl wrote: > Just a thought, as we noticed the EXACT opposite of this, and what I think is > new behavior in either mmmount or .. Does the file system exist in > your /etc/fstab (or AIX equiv) yet? Apologies, I meant mmsdrfsdef, not mmfsfuncs. Ed > > Ed > > On Mon, 5 Jun 2017 15:54:09 +0000 > "Oesterlin, Robert" wrote: > > > Our node build process re-adds a node to the cluster and then does a > > ?service gpfs start?, but GPFS doesn?t start. >From the build log: > > > > + ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com > > '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com' > > + rc=0 > > + chkconfig gpfs on > > + service gpfs start > > > > The ?service gpfs start? command hangs and never seems to return. > > > > If I look at the process tree: > > > > [root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs" > > 11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post > > 12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 > > 10 /var/adm/ras/mmsdrserv.log 128 yes no 12208 ? S > > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 12271 ? > > S 0:00 /bin/sh /sbin/service gpfs start 12276 ? S > > 0:00 /bin/sh /etc/init.d/gpfs start 12278 ? S > > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot > > 12292 ? S > > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot > > 12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num > > 12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e > > s/\.num$// 21639 ? S > > 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > > > > This is GPFS 4.2.2-1 > > > > This seems to occur only on the initial startup after build - if I try to > > start GPFS again, it works just fine - any ideas on what it?s sitting here > > waiting? Nothing in mmfslog (does not exist) > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > 507-269-0413 > > > > > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From scale at us.ibm.com Mon Jun 5 22:49:23 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 5 Jun 2017 17:49:23 -0400 Subject: [gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add In-Reply-To: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> References: <1314020E-D554-47AC-81A1-371B5A526817@nuance.com> Message-ID: Looks like a bug in the code. The command hung in grep command. It has missing argument. Please open a PMR to have this fix. Instead of "service gpfs start", can you use mmstartup? You can also try to run mm list command before service gpfs start as a workaround. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 06/05/2017 11:54 AM Subject: [gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add Sent by: gpfsug-discuss-bounces at spectrumscale.org Our node build process re-adds a node to the cluster and then does a ?service gpfs start?, but GPFS doesn?t start. From the build log: + ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com' + rc=0 + chkconfig gpfs on + service gpfs start The ?service gpfs start? command hangs and never seems to return. If I look at the process tree: [root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs" 11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post 12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes no 12208 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 12271 ? S 0:00 /bin/sh /sbin/service gpfs start 12276 ? S 0:00 /bin/sh /etc/init.d/gpfs start 12278 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot 12292 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot 12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num 12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e s/\.num$// 21639 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 This is GPFS 4.2.2-1 This seems to occur only on the initial startup after build - if I try to start GPFS again, it works just fine - any ideas on what it?s sitting here waiting? Nothing in mmfslog (does not exist) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From stijn.deweirdt at ugent.be Tue Jun 6 08:05:06 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 6 Jun 2017 09:05:06 +0200 Subject: [gpfsug-discuss] gpfs waiters debugging Message-ID: hi all, we have recently been hit by quite a few cases that triggered long waiters. we are aware of the excellent slides http://files.gpfsug.org/presentations/2017/NERSC/GPFS-Troubleshooting-Apr-2017.pdf but we are wondering if and how we can cause those waiters ourself, so we can train ourself in debugging and resolving them (either on test system or in controlled environment on the production clusters). all hints welcome. stijn From Robert.Oesterlin at nuance.com Tue Jun 6 12:44:31 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 6 Jun 2017 11:44:31 +0000 Subject: [gpfsug-discuss] gpfs waiters debugging Message-ID: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> Hi Stijn You need to provide some more details on the type and duration of the waiters before the group can offer some advice. Bob Oesterlin Sr Principal Storage Engineer, Nuance On 6/6/17, 2:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stijn De Weirdt" wrote: but we are wondering if and how we can cause those waiters ourself, so we can train ourself in debugging and resolving them (either on test system or in controlled environment on the production clusters). all hints welcome. stijn _______________________________________________ From stijn.deweirdt at ugent.be Tue Jun 6 13:29:43 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 6 Jun 2017 14:29:43 +0200 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> Message-ID: <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be> hi bob, waiters from RPC replies and/or threads waiting on mutex are most "popular". but my question is not how to resolve them, the question is how to create such a waiter so we can train ourself in grep and mmfsadm etc etc we want to recreate the waiters a few times, try out some things and either script or at least put instructions on our internal wiki what to do. the instructions in the slides are clear enough, but there are a lot of slides, and typically when this occurs offshift, you don't want to start with rereading the slides and wondering what to do next; let alone debug scripts ;) thanks, stijn On 06/06/2017 01:44 PM, Oesterlin, Robert wrote: > Hi Stijn > > You need to provide some more details on the type and duration of the waiters before the group can offer some advice. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > On 6/6/17, 2:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stijn De Weirdt" wrote: > > > but we are wondering if and how we can cause those waiters ourself, so > we can train ourself in debugging and resolving them (either on test > system or in controlled environment on the production clusters). > > all hints welcome. > > stijn > _______________________________________________ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From stockf at us.ibm.com Tue Jun 6 13:57:00 2017 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 6 Jun 2017 08:57:00 -0400 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be> References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be> Message-ID: Realize that generally any waiter under 1 second should be ignored. In an active GPFS system there are always waiters and the greater the use of the system likely the more waiters you will see. The point is waiters themselves are not an indication your system is having problems. As for creating them any steady level of activity against the file system should cause waiters to appear, though most should be of a short duration. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Stijn De Weirdt To: gpfsug-discuss at spectrumscale.org Date: 06/06/2017 08:31 AM Subject: Re: [gpfsug-discuss] gpfs waiters debugging Sent by: gpfsug-discuss-bounces at spectrumscale.org hi bob, waiters from RPC replies and/or threads waiting on mutex are most "popular". but my question is not how to resolve them, the question is how to create such a waiter so we can train ourself in grep and mmfsadm etc etc we want to recreate the waiters a few times, try out some things and either script or at least put instructions on our internal wiki what to do. the instructions in the slides are clear enough, but there are a lot of slides, and typically when this occurs offshift, you don't want to start with rereading the slides and wondering what to do next; let alone debug scripts ;) thanks, stijn On 06/06/2017 01:44 PM, Oesterlin, Robert wrote: > Hi Stijn > > You need to provide some more details on the type and duration of the waiters before the group can offer some advice. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > On 6/6/17, 2:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stijn De Weirdt" wrote: > > > but we are wondering if and how we can cause those waiters ourself, so > we can train ourself in debugging and resolving them (either on test > system or in controlled environment on the production clusters). > > all hints welcome. > > stijn > _______________________________________________ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Tue Jun 6 14:06:57 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Tue, 6 Jun 2017 15:06:57 +0200 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be> Message-ID: oh sure, i meant waiters that last > 300 seconds or so (something that could trigger deadlock). obviously we're not interested in debugging the short ones, it's not that gpfs doesn't work or anything ;) stijn On 06/06/2017 02:57 PM, Frederick Stock wrote: > Realize that generally any waiter under 1 second should be ignored. In an > active GPFS system there are always waiters and the greater the use of the > system likely the more waiters you will see. The point is waiters > themselves are not an indication your system is having problems. > > As for creating them any steady level of activity against the file system > should cause waiters to appear, though most should be of a short duration. > > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: Stijn De Weirdt > To: gpfsug-discuss at spectrumscale.org > Date: 06/06/2017 08:31 AM > Subject: Re: [gpfsug-discuss] gpfs waiters debugging > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > hi bob, > > waiters from RPC replies and/or threads waiting on mutex are most > "popular". > > but my question is not how to resolve them, the question is how to > create such a waiter so we can train ourself in grep and mmfsadm etc etc > > we want to recreate the waiters a few times, try out some things and > either script or at least put instructions on our internal wiki what to > do. > > the instructions in the slides are clear enough, but there are a lot of > slides, and typically when this occurs offshift, you don't want to start > with rereading the slides and wondering what to do next; let alone debug > scripts ;) > > thanks, > > stijn > > On 06/06/2017 01:44 PM, Oesterlin, Robert wrote: >> Hi Stijn >> >> You need to provide some more details on the type and duration of the > waiters before the group can offer some advice. >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> >> >> On 6/6/17, 2:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf > of Stijn De Weirdt" stijn.deweirdt at ugent.be> wrote: >> >> >> but we are wondering if and how we can cause those waiters ourself, > so >> we can train ourself in debugging and resolving them (either on test >> system or in controlled environment on the production clusters). >> >> all hints welcome. >> >> stijn >> _______________________________________________ >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From valdis.kletnieks at vt.edu Tue Jun 6 17:45:51 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 06 Jun 2017 12:45:51 -0400 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be>

Message-ID: <6873.1496767551@turing-police.cc.vt.edu> On Tue, 06 Jun 2017 15:06:57 +0200, Stijn De Weirdt said: > oh sure, i meant waiters that last > 300 seconds or so (something that > could trigger deadlock). obviously we're not interested in debugging the > short ones, it's not that gpfs doesn't work or anything ;) At least at one time, a lot of the mm(whatever) administrative commands would leave one dangling waiter for the duration of the command - which could be a while if the command was mmdeldisk or mmrestripefs. I admit not having specifically checked for gpfs 4.2, but it was true for 3.2 through 4.1.... And my addition to the collective debugging knowledge: A bash one-liner to dump all the waiters across a cluster, sorted by wait time. Note that our clusters tend to be 5-8 servers, this may be painful for those of you who have 400+ node clusters. :) ##!/bin/bash for i in ` mmlsnode | tail -1 | sed 's/^[ ]*[^ ]*[ ]*//'`; do ssh $i /usr/lpp/mmfs/bin/mmfsadm dump waiters | sed "s/^/$i /"; done | sort -n -r -k 3 -t' ' We've found it useful - if you have 1 waiter on one node that's 1278 seconds old, and 3 other nodes have waiters that are 1275 seconds old, it's a good chance the other 3 nodes waiters are waiting on the first node's waiter to resolve itself.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From stockf at us.ibm.com Tue Jun 6 17:54:06 2017 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 6 Jun 2017 12:54:06 -0400 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: <6873.1496767551@turing-police.cc.vt.edu> References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com><3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be>

<6873.1496767551@turing-police.cc.vt.edu> Message-ID: On recent releases you can accomplish the same with the command, "mmlsnode -N waiters -L". Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Date: 06/06/2017 12:46 PM Subject: Re: [gpfsug-discuss] gpfs waiters debugging Sent by: gpfsug-discuss-bounces at spectrumscale.org On Tue, 06 Jun 2017 15:06:57 +0200, Stijn De Weirdt said: > oh sure, i meant waiters that last > 300 seconds or so (something that > could trigger deadlock). obviously we're not interested in debugging the > short ones, it's not that gpfs doesn't work or anything ;) At least at one time, a lot of the mm(whatever) administrative commands would leave one dangling waiter for the duration of the command - which could be a while if the command was mmdeldisk or mmrestripefs. I admit not having specifically checked for gpfs 4.2, but it was true for 3.2 through 4.1.... And my addition to the collective debugging knowledge: A bash one-liner to dump all the waiters across a cluster, sorted by wait time. Note that our clusters tend to be 5-8 servers, this may be painful for those of you who have 400+ node clusters. :) ##!/bin/bash for i in ` mmlsnode | tail -1 | sed 's/^[ ]*[^ ]*[ ]*//'`; do ssh $i /usr/lpp/mmfs/bin/mmfsadm dump waiters | sed "s/^/$i /"; done | sort -n -r -k 3 -t' ' We've found it useful - if you have 1 waiter on one node that's 1278 seconds old, and 3 other nodes have waiters that are 1275 seconds old, it's a good chance the other 3 nodes waiters are waiting on the first node's waiter to resolve itself.... [attachment "attltepl.dat" deleted by Frederick Stock/Pittsburgh/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jun 6 19:05:15 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 6 Jun 2017 14:05:15 -0400 Subject: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - SpaceManagement (GPFS HSM) In-Reply-To: <20170602111241.56882fx2qr2yz2ax@support.scinet.utoronto.ca> References: <20170602052836.11563o7dj205wptw@support.scinet.utoronto.ca>,

<20170602111241.56882fx2qr2yz2ax@support.scinet.utoronto.ca> Message-ID: Hi, Just as Jaime has explained, any GPFS node in the cluster, can induce a recall (as he called "staged") by access to file data. It is not optimized by tape order, and a dynamic file access of any pattern, such as "find" or "cat *" will surely result in an inefficient processing of the data recall if all data lives in physical tape. But if migrated data lives on spinning disk on the TSM server, there is no harm in such a recall pattern because recalls from a disk pool incur no significant overhead or delay for tape loading and positioning. Unprivileged users may not run "dsmcrecall" because creating a DMAPI session as the dsmrecall program must do, requires admin user privilege on that node. You may be able to wrap dsmrecall in a set-uid wrapper if you want to permit users to run that, but of course that comes with the danger that a recall storm could monopolize resources on your cluster. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Jaime Pinto" To: "Andrew Beattie" Cc: gpfsug-discuss at spectrumscale.org Date: 06/02/2017 11:13 AM Subject: Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - SpaceManagement (GPFS HSM) Sent by: gpfsug-discuss-bounces at spectrumscale.org It has been a while since I used HSM with GPFS via TSM, but as far as I can remember, unprivileged users can run dsmmigrate and dsmrecall. Based on the instructions on the link, dsmrecall may now leverage the Recommended Access Order (RAO) available on enterprise drives, however root would have to be the one to invoke that feature. In that case we may have to develop a middleware/wrapper for dsmrecall that will run as root and act on behalf of the user when optimization is requested. Someone here more familiar with the latest version of TSM-HSM may be able to give us some hints on how people are doing this in practice. Jaime Quoting "Andrew Beattie" : > Thanks Jaime, How do you get around Optimised recalls? from what I > can see the optimised recall process needs a root level account to > retrieve a list of files > https://www.ibm.com/support/knowledgecenter/SSSR2R_7.1.1/com.ibm.itsm.hsmul.doc/c_recall_optimized_tape.html [1] > Regards, Andrew Beattie Software Defined Storage - IT Specialist > Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com[2] ----- > Original message ----- > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Andrew Beattie" > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - > Space Management (GPFS HSM) > Date: Fri, Jun 2, 2017 7:28 PM > We have that situation. > Users don't need to login to NSD's > > What you need is to add at least one gpfs client to the cluster (or > multi-cluster), mount the DMAPI enabled file system, and use that > node > as a gateway for end-users. They can access the contents on the mount > > point with their own underprivileged accounts. > > Whether or not on a schedule, the moment an application or linux > command (such as cp, cat, vi, etc) accesses a stub, the file will be > > staged. > > Jaime > > Quoting "Andrew Beattie" : > >> Quick question, Does anyone have a Scale / GPFS environment (HPC) >> where users need the ability to recall data sets after they have > been >> stubbed, but only System Administrators are permitted to log onto > the >> NSD servers for security purposes. And if so how do you provide > the >> ability for the users to schedule their data set recalls? > Regards, >> Andrew Beattie Software Defined Storage - IT Specialist Phone: >> 614-2133-7927 E-mail: abeattie at au1.ibm.com[1] >> >> >> Links: >> ------ >> [1] mailto:abeattie at au1.ibm.com[3] >> > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials[4] > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jun 6 19:15:22 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 6 Jun 2017 18:15:22 +0000 Subject: [gpfsug-discuss] gpfs waiters debugging In-Reply-To: References: <3152E93F-DC76-456F-BBAC-E203D06E597E@nuance.com> <3cbb9375-86c9-3f2e-ec3a-bd4cea1455d8@ugent.be>

<6873.1496767551@turing-police.cc.vt.edu> Message-ID: All, mmlsnode -N waiters is great ? I also appreciate the ?-s? option to it. Very helpful when you know the problem started say, slightly more than half an hour ago and you therefore don?t care about sub-1800 second waiters? Kevin On Jun 6, 2017, at 11:54 AM, Frederick Stock > wrote: On recent releases you can accomplish the same with the command, "mmlsnode -N waiters -L". Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: valdis.kletnieks at vt.edu To: gpfsug main discussion list > Date: 06/06/2017 12:46 PM Subject: Re: [gpfsug-discuss] gpfs waiters debugging Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On Tue, 06 Jun 2017 15:06:57 +0200, Stijn De Weirdt said: > oh sure, i meant waiters that last > 300 seconds or so (something that > could trigger deadlock). obviously we're not interested in debugging the > short ones, it's not that gpfs doesn't work or anything ;) At least at one time, a lot of the mm(whatever) administrative commands would leave one dangling waiter for the duration of the command - which could be a while if the command was mmdeldisk or mmrestripefs. I admit not having specifically checked for gpfs 4.2, but it was true for 3.2 through 4.1.... And my addition to the collective debugging knowledge: A bash one-liner to dump all the waiters across a cluster, sorted by wait time. Note that our clusters tend to be 5-8 servers, this may be painful for those of you who have 400+ node clusters. :) ##!/bin/bash for i in ` mmlsnode | tail -1 | sed 's/^[ ]*[^ ]*[ ]*//'`; do ssh $i /usr/lpp/mmfs/bin/mmfsadm dump waiters | sed "s/^/$i /"; done | sort -n -r -k 3 -t' ' We've found it useful - if you have 1 waiter on one node that's 1278 seconds old, and 3 other nodes have waiters that are 1275 seconds old, it's a good chance the other 3 nodes waiters are waiting on the first node's waiter to resolve itself.... [attachment "attltepl.dat" deleted by Frederick Stock/Pittsburgh/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Jun 6 21:31:01 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 06 Jun 2017 16:31:01 -0400 Subject: [gpfsug-discuss] mmapplypolicy and ltfsee - identifying progress... Message-ID: <25944.1496781061@turing-police.cc.vt.edu> So I'm trying to get a handle on where exactly an mmapplypolicy that's doing a premigrate is in its progress. I've already determined that 'ltfsee info jobs' will only report where in the current batch it is, but that still leaves me unable to tell the difference between [I] 2017-06-05 at 17:31:47.995 Executing file list: /gpfs/archive/config/tmp/ mmPolicy.chosnlist.97168.79FD2A24.pre. 10000 files dispatched. and [I] 2017-06-06 at 02:44:48.236 Executing file list: /gpfs/archive/config/tmp/ mmPolicy.chosnlist.97168.79FD2A24.pre. 225000 files dispatched. Is there any better way to figure out where it is than writing the cron job to launch it as mmapplypolicy | tee /tmp/something and then go scraping the messages? (And yes, I know not all chunks of 1,000 files are created equal. Sometimes it's 1,000 C source files that total to less than a megabyte, other times it's 1,000 streaming video files that total to over a terabye - but even knowing it's 194,000 into 243,348 files is better than what I have now...) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jun 6 22:20:31 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 6 Jun 2017 21:20:31 +0000 Subject: [gpfsug-discuss] mmapplypolicy and ltfsee - identifying progress... In-Reply-To: <25944.1496781061@turing-police.cc.vt.edu> References: <25944.1496781061@turing-police.cc.vt.edu> Message-ID: <5987990A-39F6-47A5-981D-A34A3054E4D8@vanderbilt.edu> Hi Valdis, I?m not sure this is ?better?, but what I typically do is have mmapplypolicy running from a shell script launched by a cron job and redirecting output to a file in /tmp. Once the mmapplypolicy finishes the SysAdmin?s get the tmp file e-mailed to them and then it gets deleted. Of course, while the mmapplypolicy is running you can ?tail -f /tmp/mmapplypolicy.log? or grep it or whatever. HTHAL? Kevin On Jun 6, 2017, at 3:31 PM, valdis.kletnieks at vt.edu wrote: So I'm trying to get a handle on where exactly an mmapplypolicy that's doing a premigrate is in its progress. I've already determined that 'ltfsee info jobs' will only report where in the current batch it is, but that still leaves me unable to tell the difference between [I] 2017-06-05 at 17:31:47.995 Executing file list: /gpfs/archive/config/tmp/ mmPolicy.chosnlist.97168.79FD2A24.pre. 10000 files dispatched. and [I] 2017-06-06 at 02:44:48.236 Executing file list: /gpfs/archive/config/tmp/ mmPolicy.chosnlist.97168.79FD2A24.pre. 225000 files dispatched. Is there any better way to figure out where it is than writing the cron job to launch it as mmapplypolicy | tee /tmp/something and then go scraping the messages? (And yes, I know not all chunks of 1,000 files are created equal. Sometimes it's 1,000 C source files that total to less than a megabyte, other times it's 1,000 streaming video files that total to over a terabye - but even knowing it's 194,000 into 243,348 files is better than what I have now...) ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Jun 7 09:30:17 2017 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 7 Jun 2017 08:30:17 +0000 Subject: [gpfsug-discuss] ISC 2017 - Agenda User Meeting Message-ID: An HTML attachment was scrubbed... URL: From jan.sundermann at kit.edu Wed Jun 7 15:04:28 2017 From: jan.sundermann at kit.edu (Sundermann, Jan Erik (SCC)) Date: Wed, 7 Jun 2017 14:04:28 +0000 Subject: [gpfsug-discuss] Upgrade with architecture change Message-ID: Hi, we are operating a small Spectrum Scale cluster with about 100 clients and 6 NSD servers. The cluster is FPO-enabled. For historical reasons the NSD servers are running on ppc64 while the clients are a mixture of ppc64le and x86_64 machines. Most machines are running Red Hat Enterprise Linux 7 but we also have few machines running AIX. At the moment we have installed Spectrum Scale version 4.1.1 but would like to do an upgrade to 4.2.3. In the course of the upgrade we would like to change the architecture of all NSD servers and reinstall them with ppc64le instead of ppc64. From what I?ve learned so far it should be possible to upgrade directly from 4.1.1 to 4.2.3. Before doing the upgrade we would like to ask for some advice on the best strategy. For the NSD servers, one by one, we are thinking about doing the following: 1) Disable auto recovery 2) Unmount GPFS file system 3) Suspend disks 4) Shutdown gpfs 5) Reboot and reinstall with changed architecture ppc64le 6) Install gpfs 4.2.3 7) Recover cluster config using mmsdrrestore 8) Resume and start disks 9) Reenable auto recovery Can GPFS handle the change of the NSD server?s architecture and would it be fine to operate a mixture of different architectures for the NSD servers? Thanks, Jan Erik From tarak.patel at canada.ca Wed Jun 7 16:42:45 2017 From: tarak.patel at canada.ca (Patel, Tarak (SSC/SPC)) Date: Wed, 7 Jun 2017 15:42:45 +0000 Subject: [gpfsug-discuss] Remote cluster gpfs communication on IP different then one for Daemon or Admin node name. Message-ID: <50fd0dc6cf47485c8728fc09b7ae0263@PEVDACDEXC009.birch.int.bell.ca> Hi all, We've been experiencing issues with remote cluster node expelling CES nodes causing remote filesystems to unmount. The issue is related gpfs communication using Ethernet IP rather than IP defined on IB which is used for Daemon node name and Admin node name. So remote cluster is aware of IPs that are not defined in GPFS configuration as Admin/Daemon node name. The CES nodes are configure to have IB as well as Ethernet (for client interactive and NFS access). We've double checked /etc/hosts and DNS and all looks to be in order since the CES IPoIB IP is present in /etc/hosts of remote cluster. I'm unsure where cluster manager for remote cluster is getting the Ethernet IP if there is no mention of it in GPFS configuration. The CES nodes were added later therefore they are not listed as Contact Nodes in 'mmremotecluster show' output. The CES nodes use IP defined on IB for GPFS configuration and we also have Ethernet which has the default route defined. In order to ensure that all IB communication passes via IPoIB, we've even defined a static route so that all GPFS communication will use IPoIB (since we are dealing with a different fabric). 'mmfsadm dump tscomm' reports multiple IPs for CES nodes which includes the Ethernet and also the IPoIB. I'm unsure if there is a way to drop some connections on GPFS (cluster wide) after stopping a specific CES node and ensure that only IB is listed. I realize that one option would be to define subnet parameter for remote cluster which will require a downtime (solution to be explored at later date). Hope that someone can explain how or why remote cluster is picking IPs not used in GPFS config for remote nodes and how to ensure those IPs are not used in future. Thank you, Tarak -- Tarak Patel Chef d'?quipe, Integration HPC, Solution de calcul E-Science Service partag? Canada / Gouvernment du Canada tarak.patel at canada.ca 1-514-421-7299 Team Lead, HPC Integration, E-Science Computing Solution Shared Services Canada, Government of Canada tarak.patel at canada.ca 1-514-421-7299 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chekh at stanford.edu Wed Jun 7 23:12:56 2017 From: chekh at stanford.edu (Alex Chekholko) Date: Wed, 7 Jun 2017 15:12:56 -0700 Subject: [gpfsug-discuss] Upgrade with architecture change In-Reply-To: References: Message-ID: Hi Jan, I don't have hands-on experience with FPO or ppc64 but your procedure sounds OK to me. How do you currently handle just shutting down an NSD node for maintenance? I guess you'd have the same process except skip 5,6,7 How do you currently handle OS rebuild on NSD node? Maybe try that first without the architecture change. But I don't see why it would matter so long as you don't touch the GPFS disks. Regards, Alex On 06/07/2017 07:04 AM, Sundermann, Jan Erik (SCC) wrote: > Hi, > > we are operating a small Spectrum Scale cluster with about 100 clients and 6 NSD servers. The cluster is FPO-enabled. For historical reasons the NSD servers are running on ppc64 while the clients are a mixture of ppc64le and x86_64 machines. Most machines are running Red Hat Enterprise Linux 7 but we also have few machines running AIX. > > At the moment we have installed Spectrum Scale version 4.1.1 but would like to do an upgrade to 4.2.3. In the course of the upgrade we would like to change the architecture of all NSD servers and reinstall them with ppc64le instead of ppc64. > > From what I?ve learned so far it should be possible to upgrade directly from 4.1.1 to 4.2.3. Before doing the upgrade we would like to ask for some advice on the best strategy. > > For the NSD servers, one by one, we are thinking about doing the following: > > 1) Disable auto recovery > 2) Unmount GPFS file system > 3) Suspend disks > 4) Shutdown gpfs > 5) Reboot and reinstall with changed architecture ppc64le > 6) Install gpfs 4.2.3 > 7) Recover cluster config using mmsdrrestore > 8) Resume and start disks > 9) Reenable auto recovery > > Can GPFS handle the change of the NSD server?s architecture and would it be fine to operate a mixture of different architectures for the NSD servers? > > > Thanks, > Jan Erik From Philipp.Rehs at uni-duesseldorf.de Thu Jun 8 10:35:57 2017 From: Philipp.Rehs at uni-duesseldorf.de (Philipp Helo Rehs) Date: Thu, 8 Jun 2017 11:35:57 +0200 Subject: [gpfsug-discuss] GPFS for aarch64? Message-ID: <5848d2c0-d526-3d81-a469-6b7a10b9bf3a@uni-duesseldorf.de> Hello, we got a Cavium ThunderX-based Server and would like to use GPFS on it. Are the any package for gpfs on aarch64? Kind regards Philipp Rehs --------------------------- Zentrum f?r Informations- und Medientechnologie Kompetenzzentrum f?r wissenschaftliches Rechnen und Speichern Heinrich-Heine-Universit?t D?sseldorf Universit?tsstr. 1 Raum 25.41.00.51 40225 D?sseldorf / Germany Tel: +49-211-81-15557 From abeattie at au1.ibm.com Thu Jun 8 10:45:38 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 8 Jun 2017 09:45:38 +0000 Subject: [gpfsug-discuss] GPFS for aarch64? In-Reply-To: <5848d2c0-d526-3d81-a469-6b7a10b9bf3a@uni-duesseldorf.de> References: <5848d2c0-d526-3d81-a469-6b7a10b9bf3a@uni-duesseldorf.de> Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Jun 8 10:54:15 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 8 Jun 2017 09:54:15 +0000 Subject: [gpfsug-discuss] GPFS for aarch64? In-Reply-To: Message-ID: And Linux on Z/VM If interested feel free to open a RFE -- Cheers > On 8 Jun 2017, at 12.46, Andrew Beattie wrote: > > Philipp, > > Not to my knowledge, > > AIX > Linux on x86 ( RHEL / SUSE / Ubuntu) > Linux on Power (RHEL / SUSE) > WIndows > > are the current supported platforms > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: Philipp Helo Rehs > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [gpfsug-discuss] GPFS for aarch64? > Date: Thu, Jun 8, 2017 7:36 PM > > Hello, > > we got a Cavium ThunderX-based Server and would like to use GPFS on it. > > Are the any package for gpfs on aarch64? > > > Kind regards > > Philipp Rehs > > --------------------------- > > Zentrum f?r Informations- und Medientechnologie > Kompetenzzentrum f?r wissenschaftliches Rechnen und Speichern > > Heinrich-Heine-Universit?t D?sseldorf > Universit?tsstr. 1 > Raum 25.41.00.51 > 40225 D?sseldorf / Germany > Tel: +49-211-81-15557 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From Philipp.Rehs at uni-duesseldorf.de Thu Jun 8 11:40:23 2017 From: Philipp.Rehs at uni-duesseldorf.de (Philipp Helo Rehs) Date: Thu, 8 Jun 2017 12:40:23 +0200 Subject: [gpfsug-discuss] GPFS for aarch64? In-Reply-To: References: Message-ID: <9f47c897-74ff-9473-2ab3-343e4ce69d15@uni-duesseldorf.de> Thanks for the Information. I created an RFE: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106218 Kind regards, Philipp Rehs > Message: 6 > Date: Thu, 8 Jun 2017 09:54:15 +0000 > From: "Luis Bolinches" > To: "gpfsug main discussion list" > Subject: Re: [gpfsug-discuss] GPFS for aarch64? > Message-ID: > > > Content-Type: text/plain; charset="utf-8" > > And Linux on Z/VM > > If interested feel free to open a RFE > > -- > Cheers > >> On 8 Jun 2017, at 12.46, Andrew Beattie wrote: >> >> Philipp, >> >> Not to my knowledge, >> >> AIX >> Linux on x86 ( RHEL / SUSE / Ubuntu) >> Linux on Power (RHEL / SUSE) >> WIndows >> >> are the current supported platforms >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> >> ----- Original message ----- >> From: Philipp Helo Rehs >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Subject: [gpfsug-discuss] GPFS for aarch64? >> Date: Thu, Jun 8, 2017 7:36 PM >> >> Hello, >> >> we got a Cavium ThunderX-based Server and would like to use GPFS on it. >> >> Are the any package for gpfs on aarch64? >> >> >> Kind regards >> >> Philipp Rehs >> >> --------------------------- >> >> Zentrum f?r Informations- und Medientechnologie >> Kompetenzzentrum f?r wissenschaftliches Rechnen und Speichern >> >> Heinrich-Heine-Universit?t D?sseldorf >> Universit?tsstr. 1 >> Raum 25.41.00.51 >> 40225 D?sseldorf / Germany >> Tel: +49-211-81-15557 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 65, Issue 17 > ********************************************** > From daniel.kidger at uk.ibm.com Thu Jun 8 11:54:04 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 8 Jun 2017 10:54:04 +0000 Subject: [gpfsug-discuss] GPFS for aarch64? In-Reply-To: Message-ID: I often hear requests for Spectrum Scale on ARM. It is always for clients. In general people are happy to have their NSD servers, etc. on x86 or POWER. It is also an anomaly that for a HPC cluster, IBM supports LSF on ARM v7/v8 but not Spectrum Scale on ARM. Daniel Daniel Kidger Technical Sales Specialist, IBM UK IBM Spectrum Storage Software daniel.kidger at uk.ibm.com +44 (0)7818 522266 > On 8 Jun 2017, at 10:54, Luis Bolinches wrote: > > And Linux on Z/VM > > If interested feel free to open a RFE > > -- > Cheers > >> On 8 Jun 2017, at 12.46, Andrew Beattie wrote: >> >> Philipp, >> >> Not to my knowledge, >> >> AIX >> Linux on x86 ( RHEL / SUSE / Ubuntu) >> Linux on Power (RHEL / SUSE) >> WIndows >> >> are the current supported platforms >> Andrew Beattie >> Software Defined Storage - IT Specialist >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> >> ----- Original message ----- >> From: Philipp Helo Rehs >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Subject: [gpfsug-discuss] GPFS for aarch64? >> Date: Thu, Jun 8, 2017 7:36 PM >> >> Hello, >> >> we got a Cavium ThunderX-based Server and would like to use GPFS on it. >> >> Are the any package for gpfs on aarch64? >> >> >> Kind regards >> >> Philipp Rehs >> >> --------------------------- >> >> Zentrum f?r Informations- und Medientechnologie >> Kompetenzzentrum f?r wissenschaftliches Rechnen und Speichern >> >> Heinrich-Heine-Universit?t D?sseldorf >> Universit?tsstr. 1 >> Raum 25.41.00.51 >> 40225 D?sseldorf / Germany >> Tel: +49-211-81-15557 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Thu Jun 8 15:09:03 2017 From: duersch at us.ibm.com (Steve Duersch) Date: Thu, 8 Jun 2017 10:09:03 -0400 Subject: [gpfsug-discuss] Upgrade with architecture change In-Reply-To: References: Message-ID: We have not tested such a procedure. The only route that we have done is a complete mmdelnode/mmaddnode scenario. This would mean an mmdeldisk. It would be more time consuming since data has to move. Operating in a mixed architecture environment is not a problem. We have tested and support that. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York > > Message: 1 > Date: Wed, 7 Jun 2017 14:04:28 +0000 > From: "Sundermann, Jan Erik (SCC)" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Upgrade with architecture change > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hi, > > we are operating a small Spectrum Scale cluster with about 100 > clients and 6 NSD servers. The cluster is FPO-enabled. For > historical reasons the NSD servers are running on ppc64 while the > clients are a mixture of ppc64le and x86_64 machines. Most machines > are running Red Hat Enterprise Linux 7 but we also have few machines > running AIX. > > At the moment we have installed Spectrum Scale version 4.1.1 but > would like to do an upgrade to 4.2.3. In the course of the upgrade > we would like to change the architecture of all NSD servers and > reinstall them with ppc64le instead of ppc64. > > From what I?ve learned so far it should be possible to upgrade > directly from 4.1.1 to 4.2.3. Before doing the upgrade we would like > to ask for some advice on the best strategy. > > For the NSD servers, one by one, we are thinking about doing the following: > > 1) Disable auto recovery > 2) Unmount GPFS file system > 3) Suspend disks > 4) Shutdown gpfs > 5) Reboot and reinstall with changed architecture ppc64le > 6) Install gpfs 4.2.3 > 7) Recover cluster config using mmsdrrestore > 8) Resume and start disks > 9) Reenable auto recovery > > Can GPFS handle the change of the NSD server?s architecture and > would it be fine to operate a mixture of different architectures for > the NSD servers? > > > Thanks, > Jan Erik > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jun 8 17:01:07 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 8 Jun 2017 12:01:07 -0400 Subject: [gpfsug-discuss] Upgrade with architecture change In-Reply-To: References:

Message-ID: If you proceed carefully, it should not be necessary to mmdeldisk and mmadddisks. Although we may not have tested your exact scenario, GPFS does support fiber channel disks attached to multiple nodes. So the same disk can be attached to multiple GPFS nodes - and those nodes can be running different OSes and different GPFS versions. (That's something we do actually test!) Since GPFS can handle that with several nodes simultaneously active -- it can also handle the case when nodes come and go... Or in your case are killed and then reborn with new software... The key is to be careful... You want to unmount the file system and not re-mount until all of the disks become available again via one or more (NSD) nodes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jun 8 22:34:15 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Thu, 8 Jun 2017 21:34:15 +0000 Subject: [gpfsug-discuss] LROC/HAWC for CES nodes? Message-ID: <288751C9-7CB6-48E6-968E-938A4E56E786@siriuscom.com> I?m looking to improve performance of the SMB stack. My workload unfortunately has smallish files but in total it will still be large amount. I?m wondering if LROC/HAWC would be one way to speed things up. Is there a precedent for using this with protocol nodes in a cluster? Anyone else thinking/doing this? Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Fri Jun 9 08:44:57 2017 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Fri, 9 Jun 2017 09:44:57 +0200 Subject: [gpfsug-discuss] ISC 2017 - Agenda User Meeting In-Reply-To: References: Message-ID: There is an update of the agenda for the User Meeting at ISC. We have added a Pawsey Site Report by Chris Schlipalius. Monday June 19, 2016 - 12:00-14:30 - Conference Room Konstant 12:00-12:10 ?[10 min] ?Opening 12:10-12:25 ?[15 min] ?Spectrum Scale Support for Docker - Olaf Weiser (IBM) 12:25-13:05 ?[40 min] ?IBM Spectrum LSF family update - Bill McMillan (IBM) 13:05-13:25 ?[20 min] ?Driving Operational Efficiencies with the IBM Spectrum LSF & Ellexus Mistral - Dr. Rosemary Francis (Ellexus) 13:25-13:40 [15 min] Pawsey Site Report - Chris Schlipalius (Pawsey) 13:40-13:55 ?[15 min] ?IBM Elastic Storage Server (ESS) Update - John Sing (IBM) 13:55-14:20 ?[25 min] ?IBM Spectrum Scale Enhancements for CORAL - Sven Oehme (IBM) 14:20-14:30 ?[10 min] ?Question & Answers -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Ulf Troppens" To: gpfsug-discuss at spectrumscale.org Cc: Fabienne Wegener Date: 07.06.2017 10:30 Subject: [gpfsug-discuss] ISC 2017 - Agenda User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings: IBM is happy to announce the agenda for the joint IBM Spectrum Scale and IBM Spectrum LSF User Meeting at ISC. As with other user meetings, the agenda includes user stories, updates on IBM Spectrum Scale and IBM Spectrum LSF, and access to IBM experts and your peers. Please join us! To attend, please email Fabienne.Wegener at de.ibm.com so we can have an accurate count of attendees. Monday June 17, 2016 - 12:00-14:30 - Conference Room Konstant 12:00-12:10 [10 min] Opening 12:10-12:30 [20 min] Spectrum Scale Support for Docker - Olaf Weiser (IBM) 12:30-13:10 [40 min] IBM Spectrum LSF family update - Bill McMillan (IBM) 13:10-13:30 [20 min] Driving Operational Efficiencies with the IBM Spectrum LSF & Ellexus Mistral - Dr. Rosemary Francis (Ellexus) 13:30-13:50 [20 min] IBM Elastic Storage Server (ESS) Update - John Sing (IBM) 13:50-14:20 [30 min] IBM Spectrum Scale Enhancements for CORAL - Sven Oehme (IBM) 14:20-14:30 [10 min] Question & Answers Looking forward to seeing you there! -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jun 9 09:38:01 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Jun 2017 08:38:01 +0000 Subject: [gpfsug-discuss] LROC/HAWC for CES nodes? In-Reply-To: <288751C9-7CB6-48E6-968E-938A4E56E786@siriuscom.com> References: <288751C9-7CB6-48E6-968E-938A4E56E786@siriuscom.com> Message-ID: I?m wary of spending a lot of money on LROC devices when I don?t know what return I will get.. that said I think the main bottleneck for any SMB installation is samba itself, not the disks, so I remain largely unconvinced that LROC will help much. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mark Bush Sent: 08 June 2017 22:34 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] LROC/HAWC for CES nodes? I?m looking to improve performance of the SMB stack. My workload unfortunately has smallish files but in total it will still be large amount. I?m wondering if LROC/HAWC would be one way to speed things up. Is there a precedent for using this with protocol nodes in a cluster? Anyone else thinking/doing this? Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Jun 9 10:31:45 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Jun 2017 09:31:45 +0000 Subject: [gpfsug-discuss] TSM/SP compatibility with GPFS In-Reply-To: References:

Message-ID: Thanks Mark, didn?t mean to wait so long to reply. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 June 2017 17:40 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] TSM/SP compatibility with GPFS Upgrading from GPFS 4.2.x to GPFS 4.2.y should not "break" TSM. If it does, someone goofed, that would be a bug. (My opinion) Think of it this way. TSM is an application that uses the OS and the FileSystem(s). TSM can't verify it will work with all future versions of OS and Filesystems, and the releases can't be in lock step. Having said that, 4.2.3 has been "out" for a while, so if there were a TSM incompatibility, someone would have likely hit it or will before July... Trust but verify... From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 06/02/2017 11:51 AM Subject: [gpfsug-discuss] TSM/SP compatibility with GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, Where should I start looking for a compatibility matrix between TSM and GPFS? Specifically, we are currently running TSM 7.1.6-2 and GPFS 4.2.1-2 with the intent to upgrade to GPFS 4.2.3-latest in early July. I?ve spent 30 minutes looking over various documents and the best I can find is this: http://www-01.ibm.com/support/docview.wss?uid=swg21248771 ..which talks about TSM in a Space Management context and would suggest that we need to upgrade to Spectrum Protect i.e. 8.1 and that GPFS 4.2.2.x is the maximum supported version? Cheers Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From frank.tower at outlook.com Sat Jun 10 11:55:54 2017 From: frank.tower at outlook.com (Frank Tower) Date: Sat, 10 Jun 2017 10:55:54 +0000 Subject: [gpfsug-discuss] Infiniband: device mlx4_0 not found Message-ID: Hi everybody, I don't get why one of our compute node cannot start GPFS over IB. I have the following error: [I] VERBS RDMA starting with verbsRdmaCm=no verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes [I] VERBS RDMA library libibverbs.so (version >= 1.1) loaded and initialized. [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). [I] VERBS RDMA parse verbsPorts mlx4_0/1 [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to device mlx4_0 not found [I] VERBS RDMA library libibverbs.so unloaded. [E] VERBS RDMA failed to start, no valid verbsPorts defined. I'm using Centos 7.3, Kernel 3.10.0-514.21.1.el7.x86_64. I have 2 infinibands card, both have an IP and working well. [root at rdx110 ~]# ibstat -l mlx4_0 mlx4_1 [root at rdx110 ~]# I tried configuration with both card, and no one work with GPFS. I also tried with mlx4_0/1, but same problem. Someone already have the issue ? Kind Regards, Frank -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sat Jun 10 13:05:04 2017 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 10 Jun 2017 08:05:04 -0400 Subject: [gpfsug-discuss] Infiniband: device mlx4_0 not found In-Reply-To: References: Message-ID: Out of curiosity could you send us the output of "ibv_devinfo -v"? -Aaron Sent from my iPhone > On Jun 10, 2017, at 06:55, Frank Tower wrote: > > Hi everybody, > > > I don't get why one of our compute node cannot start GPFS over IB. > > > I have the following error: > > > [I] VERBS RDMA starting with verbsRdmaCm=no verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > [I] VERBS RDMA library libibverbs.so (version >= 1.1) loaded and initialized. > [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). > > [I] VERBS RDMA parse verbsPorts mlx4_0/1 > > [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to device mlx4_0 not found > > [I] VERBS RDMA library libibverbs.so unloaded. > > [E] VERBS RDMA failed to start, no valid verbsPorts defined. > > > > I'm using Centos 7.3, Kernel 3.10.0-514.21.1.el7.x86_64. > > > I have 2 infinibands card, both have an IP and working well. > > > [root at rdx110 ~]# ibstat -l > > mlx4_0 > > mlx4_1 > > [root at rdx110 ~]# > > I tried configuration with both card, and no one work with GPFS. > > > > I also tried with mlx4_0/1, but same problem. > > > > Someone already have the issue ? > > > Kind Regards, > > Frank > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Jun 12 20:41:17 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 12 Jun 2017 15:41:17 -0400 Subject: [gpfsug-discuss] 'mmces address move' weirdness? Message-ID: <18719.1497296477@turing-police.cc.vt.edu> So here's our address setup: mmces address list Address Node Group Attribute ------------------------------------------------------------------------- 172.28.45.72 arproto1.ar.nis.isb.internal isb none 172.28.45.73 arproto2.ar.nis.isb.internal isb none 172.28.46.72 arproto2.ar.nis.vtc.internal vtc none 172.28.46.73 arproto1.ar.nis.vtc.internal vtc none Having some nfs-ganesha weirdness on arproto2.ar.nis.vtc.internal, so I try to move the address over to its pair so I can look around without impacting users. However, seems like something insists on moving it right back 60 seconds later... Question 1: Is this expected behavior? Question 2: If it is, what use is 'mmces address move' if it just gets undone a few seconds later... (running on arproto2.ar.nis.vtc.internal): ## (date; ip addr show | grep '\.72';mmces address move --ces-ip 172.28.46.72 --ces-node arproto1.ar.nis.vtc.internal; while (/bin/true); do date; ip addr show | grep '\.72'; sleep 1; done;) | tee migrate.not.nailed.down Mon Jun 12 15:34:33 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:34:40 EDT 2017 Mon Jun 12 15:34:41 EDT 2017 Mon Jun 12 15:34:42 EDT 2017 Mon Jun 12 15:34:43 EDT 2017 (skipped) Mon Jun 12 15:35:44 EDT 2017 Mon Jun 12 15:35:45 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:46 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:47 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 ^C -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jun 12 21:01:44 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 12 Jun 2017 20:01:44 +0000 Subject: [gpfsug-discuss] 'mmces address move' weirdness? In-Reply-To: <18719.1497296477@turing-police.cc.vt.edu> References: <18719.1497296477@turing-police.cc.vt.edu> Message-ID: I think it's intended but I don't know why. The AUTH service became unhealthy on one of our CES nodes (SMB only) and we moved its float address elsewhere. CES decided to move it back again moments later despite the node not being fit. Sorry that doesn't really help but at least you're not alone! ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu Sent: 12 June 2017 20:41 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] 'mmces address move' weirdness? So here's our address setup: mmces address list Address Node Group Attribute ------------------------------------------------------------------------- 172.28.45.72 arproto1.ar.nis.isb.internal isb none 172.28.45.73 arproto2.ar.nis.isb.internal isb none 172.28.46.72 arproto2.ar.nis.vtc.internal vtc none 172.28.46.73 arproto1.ar.nis.vtc.internal vtc none Having some nfs-ganesha weirdness on arproto2.ar.nis.vtc.internal, so I try to move the address over to its pair so I can look around without impacting users. However, seems like something insists on moving it right back 60 seconds later... Question 1: Is this expected behavior? Question 2: If it is, what use is 'mmces address move' if it just gets undone a few seconds later... (running on arproto2.ar.nis.vtc.internal): ## (date; ip addr show | grep '\.72';mmces address move --ces-ip 172.28.46.72 --ces-node arproto1.ar.nis.vtc.internal; while (/bin/true); do date; ip addr show | grep '\.72'; sleep 1; done;) | tee migrate.not.nailed.down Mon Jun 12 15:34:33 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:34:40 EDT 2017 Mon Jun 12 15:34:41 EDT 2017 Mon Jun 12 15:34:42 EDT 2017 Mon Jun 12 15:34:43 EDT 2017 (skipped) Mon Jun 12 15:35:44 EDT 2017 Mon Jun 12 15:35:45 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:46 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:47 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 ^C -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jun 12 21:06:09 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 12 Jun 2017 20:06:09 +0000 Subject: [gpfsug-discuss] 'mmces address move' weirdness? In-Reply-To: References: <18719.1497296477@turing-police.cc.vt.edu> Message-ID: mmces node suspend -N Is what you want. This will move the address and stop it being assigned one, otherwise the rebalance will occur. I think you can change the way it balances, but the default is to distribute. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 12 June 2017 at 21:01 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness? I think it's intended but I don't know why. The AUTH service became unhealthy on one of our CES nodes (SMB only) and we moved its float address elsewhere. CES decided to move it back again moments later despite the node not being fit. Sorry that doesn't really help but at least you're not alone! ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org

> on behalf of valdis.kletnieks at vt.edu > Sent: 12 June 2017 20:41 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] 'mmces address move' weirdness? So here's our address setup: mmces address list Address Node Group Attribute ------------------------------------------------------------------------- 172.28.45.72 arproto1.ar.nis.isb.internal isb none 172.28.45.73 arproto2.ar.nis.isb.internal isb none 172.28.46.72 arproto2.ar.nis.vtc.internal vtc none 172.28.46.73 arproto1.ar.nis.vtc.internal vtc none Having some nfs-ganesha weirdness on arproto2.ar.nis.vtc.internal, so I try to move the address over to its pair so I can look around without impacting users. However, seems like something insists on moving it right back 60 seconds later... Question 1: Is this expected behavior? Question 2: If it is, what use is 'mmces address move' if it just gets undone a few seconds later... (running on arproto2.ar.nis.vtc.internal): ## (date; ip addr show | grep '\.72';mmces address move --ces-ip 172.28.46.72 --ces-node arproto1.ar.nis.vtc.internal; while (/bin/true); do date; ip addr show | grep '\.72'; sleep 1; done;) | tee migrate.not.nailed.down Mon Jun 12 15:34:33 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:34:40 EDT 2017 Mon Jun 12 15:34:41 EDT 2017 Mon Jun 12 15:34:42 EDT 2017 Mon Jun 12 15:34:43 EDT 2017 (skipped) Mon Jun 12 15:35:44 EDT 2017 Mon Jun 12 15:35:45 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:46 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 Mon Jun 12 15:35:47 EDT 2017 inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0 ^C -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jun 12 21:17:08 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 12 Jun 2017 16:17:08 -0400 Subject: [gpfsug-discuss] Meaning of API Stats Category Message-ID: Hi, Can anyone provide more detail about what is meant by the following two categories of stats? The PDG has a limited description as far as I could see. I'm not sure what is meant by Application PoV. Would the Grafana bridge count as an "application"? Category 1, GPFSFileSystemAPI: This metrics gives the following information for each file system (application view). For example: myMachine|GPFSFilesystemAPI|myCluster|myFilesystem|gpfs_fis_bytes_read . gpfs_fis_bytes_read Number of bytes read. gpfs_fis_bytes_written Number of bytes written. gpfs_fis_close_calls Number of close calls. gpfs_fis_disks Number of disks in the file system. gpfs_fis_inodes_written Number of inode updates to disk. gpfs_fis_open_calls Number of open calls. gpfs_fis_read_calls Number of read calls. gpfs_fis_readdir_calls Number of readdir calls. gpfs_fis_write_calls Number of write calls. Category 2, GPFSNodeAPI: This metrics gives the following information for a particular node from its application point of view. For example: myMachine|GPFSNodeAPI|gpfs_is_bytes_read . gpfs_is_bytes_read Number of bytes read. gpfs_is_bytes_written Number of bytes written. gpfs_is_close_calls Number of close calls. gpfs_is_inodes_written Number of inode updates to disk. gpfs_is_open_calls Number of open calls. gpfs_is_readDir_calls Number of readdir calls. gpfs_is_read_calls Number of read calls. gpfs_is_write_calls Number of write calls. Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jun 12 21:42:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 12 Jun 2017 20:42:47 +0000 Subject: [gpfsug-discuss] Meaning of API Stats Category Message-ID: Hi Kristy What I *think* the difference is: gpfs_fis: - calls to the GPFS file system interface gpfs_fs: calls from the node that actually make it to the NSD server/metadata The difference being what?s served out of the local node pagepool. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Monday, June 12, 2017 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category Hi, Can anyone provide more detail about what is meant by the following two categories of stats? The PDG has a limited description as far as I could see. I'm not sure what is meant by Application PoV. Would the Grafana bridge count as an "application"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Jun 12 22:01:36 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 12 Jun 2017 17:01:36 -0400 Subject: [gpfsug-discuss] Meaning of API Stats Category In-Reply-To: References: Message-ID: Hello Kristy, The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause a the writing of disk blocks due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. I hope this helps. Eric Agar Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 06/12/2017 04:43 PM Subject: Re: [gpfsug-discuss] Meaning of API Stats Category Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Kristy What I *think* the difference is: gpfs_fis: - calls to the GPFS file system interface gpfs_fs: calls from the node that actually make it to the NSD server/metadata The difference being what?s served out of the local node pagepool. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Monday, June 12, 2017 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category Hi, Can anyone provide more detail about what is meant by the following two categories of stats? The PDG has a limited description as far as I could see. I'm not sure what is meant by Application PoV. Would the Grafana bridge count as an "application"? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jun 12 23:50:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 12 Jun 2017 22:50:44 +0000 Subject: [gpfsug-discuss] Meaning of API Stats Category Message-ID: <163FC574-4191-4C20-A4C7-E66DB1868BF3@nuance.com> Can you tell me how LROC plays into this? I?m trying to understand if the difference between gpfs_ns_bytes_read and gpfs_is_bytes_read on a cluster-wide basis reflects the amount of data that is recalled from pagepool+LROC (assuming the majority of the nodes have LROC. Any insight on LROC stats would helpful as well. [cid:image001.png at 01D2E3A4.63CEE1D0] Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Monday, June 12, 2017 at 4:01 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Meaning of API Stats Category Hello Kristy, The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause a the writing of disk blocks due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. I hope this helps. Eric Agar Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 06/12/2017 04:43 PM Subject: Re: [gpfsug-discuss] Meaning of API Stats Category Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Kristy What I *think* the difference is: gpfs_fis: - calls to the GPFS file system interface gpfs_fs: calls from the node that actually make it to the NSD server/metadata The difference being what?s served out of the local node pagepool. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Monday, June 12, 2017 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Meaning of API Stats Category Hi, Can anyone provide more detail about what is meant by the following two categories of stats? The PDG has a limited description as far as I could see. I'm not sure what is meant by Application PoV. Would the Grafana bridge count as an "application"? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 124065 bytes Desc: image001.png URL: From valdis.kletnieks at vt.edu Tue Jun 13 05:21:26 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 13 Jun 2017 00:21:26 -0400 Subject: [gpfsug-discuss] 'mmces address move' weirdness? In-Reply-To: References: <18719.1497296477@turing-police.cc.vt.edu> Message-ID: <15827.1497327686@turing-police.cc.vt.edu> On Mon, 12 Jun 2017 20:06:09 -0000, "Simon Thompson (IT Research Support)" said: > mmces node suspend -N > > Is what you want. This will move the address and stop it being assigned one, > otherwise the rebalance will occur. Yeah, I figured that part out. What I couldn't wrap my brain around was what the purpose of 'mmces address move' is if mmsysmon is going to just put it back... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From janfrode at tanso.net Tue Jun 13 05:42:21 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 13 Jun 2017 04:42:21 +0000 Subject: [gpfsug-discuss] 'mmces address move' weirdness? In-Reply-To: <15827.1497327686@turing-police.cc.vt.edu> References: <18719.1497296477@turing-police.cc.vt.edu> <15827.1497327686@turing-police.cc.vt.edu> Message-ID: Switch to node affinity policy, and it will stick to where you move it. "mmces address policy node-affinity". -jf tir. 13. jun. 2017 kl. 06.21 skrev : > On Mon, 12 Jun 2017 20:06:09 -0000, "Simon Thompson (IT Research Support)" > said: > > > mmces node suspend -N > > > > Is what you want. This will move the address and stop it being assigned > one, > > otherwise the rebalance will occur. > > Yeah, I figured that part out. What I couldn't wrap my brain around was > what the purpose of 'mmces address move' is if mmsysmon is going to just > put it back... > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jun 13 09:08:52 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 13 Jun 2017 08:08:52 +0000 Subject: [gpfsug-discuss] 'mmces address move' weirdness? In-Reply-To: References: <18719.1497296477@turing-police.cc.vt.edu> Message-ID: Yes, suspending the node would do it, but in the case where you want to remove a node from service but keep it running for testing it's not ideal. I think you can set the IP address balancing policy to none which might do what we want. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: 12 June 2017 21:06 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness? mmces node suspend -N Is what you want. This will move the address and stop it being assigned one, otherwise the rebalance will occur. I think you can change the way it balances, but the default is to distribute. Simon From: > on behalf of "Sobey, Richard A" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 12 June 2017 at 21:01 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness? I think it's intended but I don't know why. The AUTH service became unhealthy on one of our CES nodes (SMB only) and we moved its float address elsewhere. CES decided to move it back again moments later despite the node not being fit. Sorry that doesn't really help but at least you're not alone! ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org