From ulmer at ulmer.org Thu Dec 1 02:45:48 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 21:45:48 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen > On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > > Options I?m considering: > > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > > Comments or other ideas welcome. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Dec 1 03:55:38 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 30 Nov 2016 22:55:38 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: Hello Stephen, There are three licensing models for Spectrum Scale | GPFS: Server FPO Client I think the thing you might be missing is the associated cost per function. Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: Stephen Ulmer To: gpfsug main discussion list Date: 11/30/2016 09:46 PM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: Bob, If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Date: Thu, Dec 1, 2016 12:34 AM Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From zgiles at gmail.com Thu Dec 1 03:59:40 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 22:59:40 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke wrote: > I have once set up a small system with just a few SSDs in two NSD servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 04:03:52 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 23:03:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: <7F68B673-EA06-4E99-BE51-B76C06FE416E@ulmer.org> The licensing model was my last point ? if the OP uses FPO just to create data resiliency they increase their cost (or curtail their access). I was really asking if there was a real, technical positive for using FPO in this example, as I could only come up with equivalences and negatives. -- Stephen > On Nov 30, 2016, at 10:55 PM, Ken Hill wrote: > > Hello Stephen, > > There are three licensing models for Spectrum Scale | GPFS: > > Server > FPO > Client > > I think the thing you might be missing is the associated cost per function. > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > From: Stephen Ulmer > To: gpfsug main discussion list > Date: 11/30/2016 09:46 PM > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > I don?t understand what FPO provides here that mirroring doesn?t: > You can still use failure domains ? one for each node. > Both still have redundancy for the data; you can lose a disk or a node. > The data has to be re-striped in the event of a disk failure ? no matter what. > > Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. > > What am I missing? > > Liberty, > > -- > Stephen > > > > On Nov 30, 2016, at 3:51 PM, Andrew Beattie > wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > Options I?m considering: > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > Comments or other ideas welcome. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Dec 1 04:15:17 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 30 Nov 2016 23:15:17 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zgiles at gmail.com Thu Dec 1 04:27:27 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 23:27:27 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton > of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet > available, but if one scours the interwebs they can find mention of > something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/at > tachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x- > gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog ( > https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a > similar situation. It's perhaps at a very high conceptually level similar > to Mestor. You erasure code your data across the nodes w/ the SAS disks and > then present those block devices to your NSD servers. I proved it could > work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then > replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > >> Just remember that replication protects against data availability, not >> integrity. GPFS still requires the underlying block device to return >> good data. >> >> If you're using it on plain disks (SAS or SSD), and the drive returns >> corrupt data, GPFS won't know any better and just deliver it to the >> client. Further, if you do a partial read followed by a write, both >> replicas could be destroyed. There's also no efficient way to force use >> of a second replica if you realize the first is bad, short of taking the >> first entirely offline. In that case while migrating data, there's no >> good way to prevent read-rewrite of other corrupt data on your drive >> that has the "good copy" while restriping off a faulty drive. >> >> Ideally RAID would have a goal of only returning data that passed the >> RAID algorithm, so shouldn't be corrupt, or made good by recreating from >> parity. However, as we all know RAID controllers are definitely prone to >> failures as well for many reasons, but at least a drive can go bad in >> various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) >> without (hopefully) silent corruption.. >> >> Just something to think about while considering replication .. >> >> >> >> On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > > wrote: >> >> I have once set up a small system with just a few SSDs in two NSD >> servers, >> providin a scratch file system in a computing cluster. >> No RAID, two replica. >> works, as long the admins do not do silly things (like rebooting >> servers >> in sequence without checking for disks being up in between). >> Going for RAIDs without GPFS replication protects you against single >> disk >> failures, but you're lost if just one of your NSD servers goes off. >> >> FPO makes sense only sense IMHO if your NSD servers are also >> processing >> the data (and then you need to control that somehow). >> >> Other ideas? what else can you do with GPFS and local disks than >> what you >> considered? I suppose nothing reasonable ... >> >> >> Mit freundlichen Gr??en / Kind regards >> >> >> Dr. Uwe Falke >> >> IT Specialist >> High Performance Computing Services / Integrated Technology Services / >> Data Center Services >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland >> Rathausstr. 7 >> 09111 Chemnitz >> Phone: +49 371 6978 2165 >> Mobile: +49 175 575 2877 >> E-Mail: uwefalke at de.ibm.com >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland Business & Technology Services GmbH / >> Gesch?ftsf?hrung: >> Frank Hammer, Thorsten Moehring >> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht >> Stuttgart, >> HRB 17122 >> >> >> >> >> From: "Oesterlin, Robert" > > >> To: gpfsug main discussion list >> > > >> Date: 11/30/2016 03:34 PM >> Subject: [gpfsug-discuss] Strategies - servers with local SAS >> disks >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> >> Looking for feedback/strategies in setting up several GPFS servers >> with >> local SAS. They would all be part of the same file system. The >> systems are >> all similar in configuration - 70 4TB drives. >> >> Options I?m considering: >> >> - Create RAID arrays of the disks on each server (worried about the >> RAID >> rebuild time when a drive fails with 4, 6, 8TB drives) >> - No RAID with 2 replicas, single drive per NSD. When a drive fails, >> recreate the NSD ? but then I need to fix up the data replication via >> restripe >> - FPO ? with multiple failure groups - letting the system manage >> replica >> placement and then have GPFS due the restripe on disk failure >> automatically >> >> Comments or other ideas welcome. >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> 507-269-0413 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> -- >> Zach Giles >> zgiles at gmail.com >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 12:47:43 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 12:47:43 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 13:13:31 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 1 Dec 2016 08:13:31 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: > on behalf of Zachary Giles > > Reply-To: gpfsug main discussion list > > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/ ) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > >> wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > > Mobile: +49 175 575 2877 > > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > >> > To: gpfsug main discussion list > > >> > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > -- > Zach Giles > zgiles at gmail.com > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 13:20:46 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 13:20:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Yep, I should have added those requirements :-) 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. 2) Yes, it will be backed up 3) I expect it to grow over time 4) Data integrity requirement: high Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Stephen Ulmer Reply-To: gpfsug main discussion list Date: Thursday, December 1, 2016 at 7:13 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert > wrote: Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Thu Dec 1 18:22:36 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Thu, 1 Dec 2016 10:22:36 -0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Hi Bob, If you mean #4 with 2x data replication...then I would be very wary as the chance of data loss would be very high given local disk failure rates. So I think its really #4 with 3x replication vs #3 with 2x replication (and raid5/6 in node) (with maybe 3x for metadata). The space overhead is somewhat similar, but the rebuild times should be much faster for #3 given that a failed disk will not place any load on the storage network (as well there will be less data placed on network). Dean From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/01/2016 04:48 AM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog ( https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From eric.wonderley at vt.edu Thu Dec 1 19:10:08 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 1 Dec 2016 14:10:08 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk Message-ID: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 1 20:28:36 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 1 Dec 2016 14:28:36 -0600 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: I always suspend the disk then use mmrestripefs -m to remove the data. Then delete the disk with mmdeldisk. ?m Migrates all critical data off of any suspended disk in this file system. Critical data is all data that would be lost if currently suspended disks were removed. Can do multiple that why and us the entire cluster to move data if you want. On 12/1/16 1:10 PM, J. Eric Wonderley wrote: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Dec 1 23:50:16 2016 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 01 Dec 2016 18:50:16 -0500 Subject: [gpfsug-discuss] Upgrading kernel on RHEL In-Reply-To: Your message of "Tue, 29 Nov 2016 20:56:25 +0000." References: <904EEBB5-E1DD-4606-993F-7E91ADA1FC37@cfms.org.uk>, Message-ID: <2253-1480636216.904015@Srjh.LZ4V.h1Mt> In the message dated: Tue, 29 Nov 2016 20:56:25 +0000, The pithy ruminations from Luis Bolinches on were: => Its been around in certain cases, some kernel <-> storage combination get => hit some not => => Scott referenced it here https://www.ibm.com/developerworks/community/wikis => /home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/ => Storage+with+GPFS+on+Linux => => https://access.redhat.com/solutions/2437991 => => It happens also on 7.2 and 7.3 ppc64 (not yet on the list of "supported") => it does not on 7.1. I can confirm this at least for XIV storage, that it => can go up to 1024 only. => => I know the FAQ will get updated about this, at least there is a CMVC that => states so. => => Long short, you create a FS, and you see all your paths die and recover and => die and receover and ..., one after another. And it never really gets done. => Also if you boot from SAN ... well you can figure it out ;) Wow, that sounds extremely similar to a kernel bug/incompatibility with GPFS that I reported in May: https://patchwork.kernel.org/patch/9140337/ https://bugs.centos.org/view.php?id=10997 My conclusion is not to apply kernel updates, unless strictly necessary (Dirty COW, anyone) or tested & validated with GPFS. Mark => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: Nathan Harper > Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug main discussion list => Cc: => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 10:44 PM => => This is the first I've heard of this max_sectors_kb issue, has it => already been discussed on the list? Can you point me to any more info? => => => => On 29 Nov 2016, at 19:08, Luis Bolinches => wrote: => => => Seen that one on 6.8 too => => teh 4096 does NOT work if storage is XIV then is 1024 => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: "Kevin D Johnson" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 8:48 PM => => I have run into the max_sectors_kb issue and creating a file => system when moving beyond 3.10.0-327 on RH 7.2 as well. You => either have to reinstall the OS or walk the kernel back to 327 => via: => => https://access.redhat.com/solutions/186763 => => Kevin D. Johnson, MBA, MAFM => Spectrum Computing, Senior Managing Consultant => => IBM Certified Deployment Professional - Spectrum Scale V4.1.1 => IBM Certified Deployment Professional - Cloud Object Storage => V3.8 => IBM Certified Solution Advisor - Spectrum Computing V1 => => 720.349.6199 - kevindjo at us.ibm.com => => => => => ----- Original message ----- => From: "Luis Bolinches" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 5:20 AM => => My 2 cents => => And I am sure different people have different opinions. => => New kernels might be problematic. => => Now got my fun with RHEL 7.3 kernel and max_sectors_kb for => new FS. Is something will come to the FAQ soon. It is => already on draft not public. => => I guess whatever you do .... get a TEST cluster and do it => there first, that is better the best advice I could give. => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 => Finland => Phone: +358 503112585 => => "If you continually give you will continually have." => Anonymous => => => => ----- Original message ----- => From: "Sobey, Richard A" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: "'gpfsug-discuss at spectrumscale.org'" < => gpfsug-discuss at spectrumscale.org> => Cc: => Subject: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 11:59 AM => => => All, => => => => As a general rule, when updating GPFS to a newer => release, would you perform a full OS update at the same => time, and/or update the kernel too? => => => => Just trying to gauge what other people do in this => respect. Personally I?ve always upgraded everything at => once ? including kernel. Am I looking for trouble? => => => => Cheers => => Richard => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise => above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => From Robert.Oesterlin at nuance.com Fri Dec 2 13:31:26 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Dec 2016 13:31:26 +0000 Subject: [gpfsug-discuss] Follow-up: Storage Rich Server options Message-ID: <13B8F551-BCA2-4690-B45A-736BA549D2FC@nuance.com> Some follow-up to the discussion I kicked off a few days ago. Using simple GPFS replication on two sites looked like a good option, until you consider it?s really RAID5, and if the replica copy of the data fails during the restripe, you lose data. It?s not as bad as RAID5 because the data blocks for a file are spread across multiple servers versus reconstruction of a single array. Raid 6 + Metadata replication isn?t a bad option but you are vulnerable to server failure. It?s relatively low expansion factor makes it attractive. My personal recommendation is going to be use Raid 6 + Metadata Replication (use ?unmountOnDiskFail=meta? option), keep a spare server around to minimize downtime if one fails. Array rebuild times will impact performance, but it?s the price of having integrity. Comments? Data Distribution Expansion Factor Data Availability (Disk Failure) Data Availability (Server Failure) Data Integrity Comments Raid 6 (6+2) + Metadata replication 1.25+ High Low High Single server or single LUN failure results in some data being unavailable. Single Drive failure - lower performance during array rebuild. 2 site replication (GPFS) 2 High High Low Similar to RAID 5 - vulnerable to multiple disk failures. Rebuild done via GPFS restripe. URE vulnerable during restripe, but data distribution may mitigate this. Raid 6 (6+2) + Full 2 site replication (GPFS) 2.5 High High High Protected against single server and double drive failures. Single Drive failure - lower performance during array rebuild. Full 3 site replication (GPFS) 3 High High High Similar to RAID 6. Protected against single server and double drive failures. Rebuild done via GPFS restripe. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 15:03:59 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 10:03:59 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: Ah...rpldisk is used to fix a single problem and typically you don't want to take a long trip thru md for just one small problem. Likely why it is seldom if ever used. On Thu, Dec 1, 2016 at 3:28 PM, Matt Weil wrote: > I always suspend the disk then use mmrestripefs -m to remove the data. > Then delete the disk with mmdeldisk. > > ?m > Migrates all critical data off of any suspended > disk in this file system. Critical data is all > data that would be lost if currently suspended > disks were removed. > Can do multiple that why and us the entire cluster to move data if you > want. > > On 12/1/16 1:10 PM, J. Eric Wonderley wrote: > > I have a few misconfigured disk groups and I have a few same size > correctly configured disk groups. > > Is there any (dis)advantage to running mmrpldisk over mmdeldisk and > mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat > painful(even with qos) process. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 20:51:14 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 15:51:14 -0500 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: Hi Michael: I was about to ask a similar question about nested filesets. I have this setup: [root at cl001 ~]# mmlsfileset home Filesets in file system 'home': Name Status Path root Linked /gpfs/home group Linked /gpfs/home/group predictHPC Linked /gpfs/home/group/predictHPC and I see this: [root at cl001 ~]# mmlsfileset home -L -d Collecting fileset usage information ... Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 I would have though that usage in fileset predictHPC would also go against the group fileset On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday < michael.holliday at crick.ac.uk> wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. > All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files > in the root fileset, or if it would include the sub filesets aswell. eg > If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Heiner.Billich at psi.ch Mon Dec 5 10:26:47 2016 From: Heiner.Billich at psi.ch (Heiner Billich) Date: Mon, 5 Dec 2016 11:26:47 +0100 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: Hello, I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... I really would prefer some ready-to-use script or program. Thank you and kind regards, Heiner Billich From peserocka at gmail.com Mon Dec 5 11:25:38 2016 From: peserocka at gmail.com (P Serocka) Date: Mon, 5 Dec 2016 19:25:38 +0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> References: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Message-ID: <6911BC0E-89DE-4C42-A46C-5DADB31E415A@gmail.com> It would be helpful to make a strict priority list of points like these: - use existing hw at no additional cost (kind of the starting point of this project) - data integrity requirement: high as you wrote - Performance (r/w/random): assumed low? - Flexibility of file tree layout: low? because: static content, "just" growing In case I got the priorities in the right order by pure chance, having ZFS as part of the solution would come to my mind (first two points). Then, with performance and flexibility on the lower ranks, I might consider... not to... deploy.... GPFS at all, but stick with with 12 separate archive servers. You actual priority list might be different. I was trying to illustrate how a strict ranking, and not cheating on yourself, simplifies drawing conclusions in a top-down approach. hth -- Peter On 2016 Dec 1. md, at 21:20 st, Oesterlin, Robert wrote: > Yep, I should have added those requirements :-) > > 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. > 2) Yes, it will be backed up > 3) I expect it to grow over time > 4) Data integrity requirement: high > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: on behalf of Stephen Ulmer > Reply-To: gpfsug main discussion list > Date: Thursday, December 1, 2016 at 7:13 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? > > Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? > > That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. > > Liberty, > > -- > Stephen > > > > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > From: on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Mon Dec 5 14:09:56 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 09:09:56 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: All, I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. Best, Brian Marshall On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or > 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by > asking google. Can please somebody point me to the source? I wonder whether > it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about > 100MB/s. I know about the possible workarounds - write a wrapper script, > run several rsyncs in parallel, distribute the rsync jobs on several nodes, > use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, > which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander.kuusemets at ut.ee Mon Dec 5 14:26:21 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Mon, 5 Dec 2016 16:26:21 +0200 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster Message-ID: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: > The CES shared root directory cannot be changed when the cluster is up > and running. If you want to modify the shared root configuration, you > must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 14:34:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 14:34:27 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets : > Hello, > > I have been thinking about setting up a CES cluster on my GPFS custer for > easier data distribution. The cluster is quite an old one - since 3.4, but > we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, > Infiniband interconnected. > The problem is this little line in Spectrum Scale documentation: > > The CES shared root directory cannot be changed when the cluster is up and > running. If you want to modify the shared root configuration, you must > bring the entire cluster down. > > > Does this mean that even the first time I'm setting CES up, I have to pull > down the whole cluster? I would understand this level of service disruption > when I already had set the directory before and now I was changing it, but > on an initial setup it's quite an inconvenience. Maybe there's a less > painful way for this? > > Best regards, > > -- > Sander Kuusemets > University of Tartu, High Performance Computing > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Dec 5 15:51:14 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 5 Dec 2016 10:51:14 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: <58AC01C5-3B4B-43C0-9F62-F5B38D90EC50@ulmer.org> This is not the answer to not writing it yourself: However, be aware that GNU xargs has the -P x option, which will try to keep x batches running. It?s a good way to optimize the number of threads for anything you?re multiprocessing in the shell. So you can build a list and have xargs fork x copies of rsync or cp at a time (with -n y items in each batch). Not waiting to start the next batch when one finishes can add up to lots of MB*s very quickly. This is not the answer to anything, and is probably a waste of your time: I started to comment that if GPFS did provide such a ?data path shortcut?, I think I?d want it to work between any two allocation areas ? even two independent filesets in the same file system. Then I started working though the possibilities for just doing that? and it devolved into the realization that we?ve got to copy the data most of the time (unless it?s in the same filesystem *and* the same storage pool ? and maybe even then depending on how the allocator works). Realizing that I decide that sometimes it just sucks to have data in the wrong (old) place. :) Maybe what we want is to be able to split an independent fileset (if it is 1:1 with a storage pool) from a filesystem and graft it onto another one ? that?s probably easier and it almost mirrors vgsplit, et al. I should go do actual work... Liberty, > On Dec 5, 2016, at 9:09 AM, Brian Marshall > wrote: > > All, > > I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. > > We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. > > Best, > Brian Marshall > > On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich > wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 5 16:01:33 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 11:01:33 -0500 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 5 16:14:06 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 5 Dec 2016 16:14:06 +0000 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Hi Brian This boils down to a network contention issue ? that you are maxing out the network resources and GPFS is waiting. Now- digging deeper into why, that?s a larger issue. I?m still struggling with this myself. It takes a lot of digging into network stats, utilization, dropped packets, etc. It could be at the server/client or elsewhere in the network. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 5, 2016 at 10:01 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] waiting for exclusive use of connection for sending msg Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Dec 5 16:33:24 2016 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 5 Dec 2016 17:33:24 +0100 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe Message-ID: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Mon Dec 5 20:49:44 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 5 Dec 2016 20:49:44 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 21:31:55 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 16:31:55 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Hi Everyone, In the GPFS documentation (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) it has this to say about the duration of an upgrade from 3.5 to 4.1: > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists >because some GPFS 4.1 features become available on each node as soon as the node is upgraded, while >other features will not become available until you upgrade all participating nodes. Does anyone have a feel for what "a short time" means? I'm looking to upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the size of our system it might take several weeks to complete. Seeing this language concerns me that after some period of time something bad is going to happen, but I don't know what that period of time is. Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kevindjo at us.ibm.com Mon Dec 5 21:35:54 2016 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Mon, 5 Dec 2016 21:35:54 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 22:52:58 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 22:52:58 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I read it as "do your best". I doubt there can be problems that shows up after 3 weeks, that wouldn't also be triggerable after 1 day. -jf man. 5. des. 2016 kl. 22.32 skrev Aaron Knister : > Hi Everyone, > > In the GPFS documentation > ( > http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm > ) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time > without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. > The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 23:00:43 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 18:00:43 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Jan-Frode! If you don't mind sharing, over what period of time did you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you have in your cluster? -Aaron On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > I read it as "do your best". I doubt there can be problems that shows up > after 3 weeks, that wouldn't also be triggerable after 1 day. > > > -jf > > man. 5. des. 2016 kl. 22.32 skrev Aaron Knister > >: > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Tue Dec 6 07:25:13 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Tue, 6 Dec 2016 09:25:13 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Hello Aaron, I thought I'd share my two cents, as I just went through the process. I thought I'd do the same, start upgrading from where I can and wait until machines come available. It took me around 5 weeks to complete the process, but the last two were because I was super careful. At first nothing happened, but at one point, a week into the upgrade cycle, when I tried to mess around (create, delete, test) a fileset, suddenly I got the weirdest of error messages while trying to delete a fileset for the third time from a client node - I sadly cannot exactly remember what it said, but I can describe what happened. After the error message, the current manager of our cluster fell into arbitrating state, it's metadata disks were put to down state, manager status was given to our other server node and it's log was spammed with a lot of error messages, something like this: > mmfsd: > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) > + 0)' failed. > Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 > in process 15113, link reg 0xFFFFFFFFFFFFFFFF. > Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= > (sizeof(Pad32) + 0)) in line 1411 of file > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h > Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: > Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 > logAssertFailed + 0x2D6 at ??:0 > Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 > PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 > Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 > tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 > Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 > RcvWorker::RcvMain() + 0x107 at ??:0 > Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B > RcvWorker::thread(void*) + 0x5B at ??:0 > Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 > Thread::callBody(Thread*) + 0x46 at ??:0 > Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 > start_thread + 0xD1 at ??:0 > Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + > 0x6D at ??:0 After this I tried to put disks up again, which failed half-way through and did the same with the other server node (current master). So after this my cluster had effectively failed, because all the metadata disks were down and there was no path to the data disks. When I tried to put all the metadata disks up with one start command, then it worked on third try and the cluster got into working state again. Downtime about an hour. I created a PMR with this information and they said that it's a bug, but it's a tricky one so it's going to take a while, but during that it's not recommended to use any commands from this list: > Our apologies for the delayed response. Based on the debug data we > have and looking at the source code, we believe the assert is due to > incompatibility is arising from the feature level version for the > RPCs. In this case the culprit is the PIT "interesting inode" code. > > Several user commands employ PIT (Parallel Inode Traversal) code to > traverse each data block of every file: > >> >> mmdelfileset >> mmdelsnapshot >> mmdefragfs >> mmfileid >> mmrestripefs >> mmdeldisk >> mmrpldisk >> mmchdisk >> mmadddisk > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'node3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > server11 which is receiving it is trying to interpret the RPC beyond > the valid region (as its feature level 1502 supports PIT interesting > inodes). And apparently any of the fileset commands either, as I failed with those. After I finished the upgrade, everything has been working wonderfully. But during this upgrade time I'd recommend to tread really carefully. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/05/2016 11:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing > this language concerns me that after some period of time something bad > is going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > From janfrode at tanso.net Tue Dec 6 08:04:04 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 6 Dec 2016 09:04:04 +0100 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Currently I'm with IBM Lab Services, and only have small test clusters myself. I'm not sure I've done v3.5->4.1 upgrades, but this warning about upgrading all nodes within a "short time" is something that's always been in the upgrade instructions, and I've been through many of these (I've been a gpfs sysadmin since 2002 :-) http://www.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs300.doc/bl1ins_migratl.htm https://www.scribd.com/document/51036833/GPFS-V3-4-Concepts-Planning-and-Installation-Guide BTW: One relevant issue I saw recently was a rolling upgrade from 4.1.0 to 4.1.1.7 where we had some nodes in the cluster running 4.1.0.0. Apparently there had been some CCR message format changes in a later release that made 4.1.0.0-nodes not being able to properly communicate with 4.1.1.4 -- even though they should be able to co-exist in the same cluster according to the upgrade instructions. So I guess the more versions you mix in a cluster, the more likely you're to hit a version mismatch bug. Best to feel a tiny bit uneasy about not running same version on all nodes, and hurry to get them all upgraded to the same level. And also, should you hit a bug during this process, the likely answer will be to upgrade everything to same level. -jf On Tue, Dec 6, 2016 at 12:00 AM, Aaron Knister wrote: > Thanks Jan-Frode! If you don't mind sharing, over what period of time did > you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you > have in your cluster? > > -Aaron > > On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > >> I read it as "do your best". I doubt there can be problems that shows up >> after 3 weeks, that wouldn't also be triggerable after 1 day. >> >> >> -jf >> >> man. 5. des. 2016 kl. 22.32 skrev Aaron Knister >> >: >> >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com >> .ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon >> as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing >> this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 6 08:17:37 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 6 Dec 2016 08:17:37 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee>, Message-ID: I'm sure we changed this recently, I think all the CES nodes nerd to be down, but I don't think the whole cluster. We certainly set it for the first tine "live". Maybe I depends on the code version. Simi ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 05 December 2016 14:34 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES services on an existing GPFS cluster No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets >: Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: The CES shared root directory cannot be changed when the cluster is up and running. If you want to modify the shared root configuration, you must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From duersch at us.ibm.com Tue Dec 6 13:20:20 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 6 Dec 2016 08:20:20 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: You fit within the "short time". The purpose of this remark is to make it clear that this should not be a permanent stopping place. Getting all nodes up to the same version is safer and allows for the use of new features. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 5 Dec 2016 16:31:55 -0500 > From: Aaron Knister > To: gpfsug main discussion list > Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question > Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> > Content-Type: text/plain; charset="utf-8"; format=flowed > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ > com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a > time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short > time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 6 16:40:25 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 6 Dec 2016 10:40:25 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 6 17:36:11 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 06 Dec 2016 17:36:11 +0000 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil wrote: > Hello all, > > Thanks for sharing that. I am setting this up on our CES nodes. In this > example the nvme devices are not persistent. RHEL's default udev rules put > them in /dev/disk/by-id/ persistently by serial number so I modified > mmdevdiscover to look for them there. What are others doing? custom udev > rules for the nvme devices? > > Also I have used LVM in the past to stitch multiple nvme together for > better performance. I am wondering in the use case with GPFS that it may > hurt performance by hindering the ability of GPFS to do direct IO or > directly accessing memory. Any opinions there? > > Thanks > > Matt > On 12/5/16 10:33 AM, Ulf Troppens wrote: > > FYI ... in case not seen .... benchmark for LROC with NVMe > > http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf > > > -- > IBM Spectrum Scale Development - Client Engagements & Solutions Delivery > Consulting IT Specialist > Author "Storage Networks Explained" > > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Dec 7 03:47:00 2016 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 06 Dec 2016 22:47:00 -0500 Subject: [gpfsug-discuss] ltfsee fsopt question... Message-ID: <114349.1481082420@turing-police.cc.vt.edu> Is it possible to use 'ltfsee fsopt' to set stub and preview sizes on a per-fileset basis, or is it fixed across an entire filesystem? From r.sobey at imperial.ac.uk Wed Dec 7 06:29:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 06:29:27 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 Message-ID: A word of wisdom: do not try and run CES on RHEL 7.3 :) Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn't intend to run 7.3 of course as I knew it wasn't supported. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Wed Dec 7 06:45:50 2016 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 7 Dec 2016 12:15:50 +0530 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org A word of wisdom: do not try and run CES on RHEL 7.3 J Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 7 09:13:23 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 09:13:23 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: I admit I didn?t do a whole lot of troubleshooting. We don?t run NFS so I can?t speak about that. Initially the server looked like it came back ok, albeit ?Node starting up..? was observed in the output of mmlscluster ?ces. At that time I was not sure if that was a) expected behaviour and/or b) related to GPFS 4.2.1-2. Once the node went back into service I had no complaints from customers that they faced any connectivity issues. The next morning I shut down a second CES node in order to upgrade it, but I observed that the first once went into a failed state (might have been a nasty coincidence!): [root at icgpfs-ces1 yum.repos.d]# mmces state show -a NODE AUTH AUTH_OBJ NETWORK NFS OBJ SMB CES icgpfs-ces1 FAILED DISABLED HEALTHY DISABLED DISABLED DEPEND STARTING icgpfs-ces2 DEPEND DISABLED SUSPENDED DEPEND DEPEND DEPEND DEPEND icgpfs-ces3 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY icgpfs-ces4 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY (Where ICGPFS-CES1 was the node running 7.3). Also in mmces event show ?N icgpfs-ces1 ?time day the following error was logged about twice per minute: icgpfs-ces1 2016-12-06 06:32:04.968269 GMT wnbd_restart INFO WINBINDD process was not running. Trying to start it I moved the CES IP from icgpfs-ces2 to icgpfs-ces3 prior to suspending ?ces2. It was about that point I decided to abandon the planned upgrade of ?ces2, resume the node and then suspend ?ces1. Attempts to downgrade the Kernel/OS/redhat-release RPM back to 7.2 worked well, except when I tried to start CES again and the node reported ?Node failed?. I then rebuilt it completely, restored it to the cluster and it appears to be fine. Sorry I can?t be any more specific than that but I hope it helps. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ravi K Komanduri Sent: 07 December 2016 06:46 To: r.sobey at inperial.ac.uk Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES ON RHEL7.3 Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ A word of wisdom: do not try and run CES on RHEL 7.3 ?Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Wed Dec 7 09:54:05 2016 From: peserocka at gmail.com (P Serocka) Date: Wed, 7 Dec 2016 17:54:05 +0800 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: <1FBA5DC2-DD14-4606-9B5A-A4373191B461@gmail.com> > > I would have though that usage in fileset predictHPC would also go against the group fileset quota-wise these filesets are "siblings", don't be fooled by the hierarchy formed by namespace linking. hth -- Peter On 2016 Dec 3. md, at 04:51 st, J. Eric Wonderley wrote: > Hi Michael: > > I was about to ask a similar question about nested filesets. > > I have this setup: > [root at cl001 ~]# mmlsfileset home > Filesets in file system 'home': > Name Status Path > root Linked /gpfs/home > group Linked /gpfs/home/group > predictHPC Linked /gpfs/home/group/predictHPC > > > and I see this: > [root at cl001 ~]# mmlsfileset home -L -d > Collecting fileset usage information ... > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset > group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 > predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 > > I would have though that usage in fileset predictHPC would also go against the group fileset > > On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files in the root fileset, or if it would include the sub filesets aswell. eg If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Dec 7 10:34:27 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 7 Dec 2016 05:34:27 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson From daniel.kidger at uk.ibm.com Wed Dec 7 12:36:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 12:36:56 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Wed Dec 7 14:24:38 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Wed, 07 Dec 2016 14:24:38 +0000 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: > IBM says it should work ok, we are not so sure. We had node expels that > stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Dec 7 14:37:15 2016 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 7 Dec 2016 09:37:15 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Dec 7 14:47:46 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 7 Dec 2016 09:47:46 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38@brown.edu> Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 7 14:58:39 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 7 Dec 2016 14:58:39 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org" > Cc: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 7 15:59:50 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 7 Dec 2016 09:59:50 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: <05e77cc6-e3f6-7b06-521c-1d30606e02e0@wustl.edu> On 12/6/16 11:36 AM, Sven Oehme wrote: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? yes /dev/sdX may change after a reboot especially if you add devices. using udev rules makes sure the device is always the same. if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. yes that is what I meant by modifying mmdevdiscover on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . K that is what I was thinking. sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil > wrote: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Dec 7 16:00:46 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 16:00:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 7 16:31:23 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 11:31:23 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Message-ID: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Thanks Sander. That's disconcerting...yikes! Sorry for your trouble but thank you for sharing. I'm surprised this didn't shake out during testing of gpfs 3.5 and 4.1. I wonder if in light of this it's wise to do the clients first? My logic being that there's clearly an example here of 4.1 servers expecting behavior that only 4.1 clients provide. I suppose, though, that there's just as likely a chance that there could be a yet to be discovered bug in a situation where a 4.1 client expects something not provided by a 3.5 server. Our current plan is still to take servers first but I suspect we'll do a fair bit of testing with the PIT commands in our test environment just out of curiosity. Also out of curiosity, how long ago did you open that PMR? I'm wondering if there's a chance they've fixed this issue. I'm also perplexed and cocnerned that there's no documentation of the PIT commands to avoid during upgrades that I can find in any of the GPFS upgrade documentation. -Aaron On 12/6/16 2:25 AM, Sander Kuusemets wrote: > Hello Aaron, > > I thought I'd share my two cents, as I just went through the process. I > thought I'd do the same, start upgrading from where I can and wait until > machines come available. It took me around 5 weeks to complete the > process, but the last two were because I was super careful. > > At first nothing happened, but at one point, a week into the upgrade > cycle, when I tried to mess around (create, delete, test) a fileset, > suddenly I got the weirdest of error messages while trying to delete a > fileset for the third time from a client node - I sadly cannot exactly > remember what it said, but I can describe what happened. > > After the error message, the current manager of our cluster fell into > arbitrating state, it's metadata disks were put to down state, manager > status was given to our other server node and it's log was spammed with > a lot of error messages, something like this: > >> mmfsd: >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >> + 0)' failed. >> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >> (sizeof(Pad32) + 0)) in line 1411 of file >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >> logAssertFailed + 0x2D6 at ??:0 >> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >> RcvWorker::RcvMain() + 0x107 at ??:0 >> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >> RcvWorker::thread(void*) + 0x5B at ??:0 >> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >> Thread::callBody(Thread*) + 0x46 at ??:0 >> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >> start_thread + 0xD1 at ??:0 >> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >> 0x6D at ??:0 > After this I tried to put disks up again, which failed half-way through > and did the same with the other server node (current master). So after > this my cluster had effectively failed, because all the metadata disks > were down and there was no path to the data disks. When I tried to put > all the metadata disks up with one start command, then it worked on > third try and the cluster got into working state again. Downtime about > an hour. > > I created a PMR with this information and they said that it's a bug, but > it's a tricky one so it's going to take a while, but during that it's > not recommended to use any commands from this list: > >> Our apologies for the delayed response. Based on the debug data we >> have and looking at the source code, we believe the assert is due to >> incompatibility is arising from the feature level version for the >> RPCs. In this case the culprit is the PIT "interesting inode" code. >> >> Several user commands employ PIT (Parallel Inode Traversal) code to >> traverse each data block of every file: >> >>> >>> mmdelfileset >>> mmdelsnapshot >>> mmdefragfs >>> mmfileid >>> mmrestripefs >>> mmdeldisk >>> mmrpldisk >>> mmchdisk >>> mmadddisk >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'node3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> server11 which is receiving it is trying to interpret the RPC beyond >> the valid region (as its feature level 1502 supports PIT interesting >> inodes). > > And apparently any of the fileset commands either, as I failed with those. > > After I finished the upgrade, everything has been working wonderfully. > But during this upgrade time I'd recommend to tread really carefully. > > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Wed Dec 7 16:56:52 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Wed, 7 Dec 2016 18:56:52 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: It might have been some kind of a bug only we got, but I thought I'd share, just in case. The email when they said they opened a ticket for this bug's fix was quite exactly a month ago, so I doubt they've fixed it, as they said it might take a while. I don't know if this is of any help, but a paragraph from the explanation: > The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine > PIT_HelperGetWorkMH(). There are two RPC structures used in this routine > - PitHelperWorkReport > - PitInodeListPacket > > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'stage3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > tank1 which is receiving it is trying to interpret the RPC beyond the > valid region (as its feature level 1502 supports PIT interesting > inodes). This is resulting in the assert you see. As a short term > measure bringing all the nodes to the same feature level should make > the problem go away. But since we support backward compatibility, we > are opening an APAR to create a code fix. It's unfortunately going to > be a tricky fix, which is going to take a significant amount of time. > Therefore I don't expect the team will be able to provide an efix > anytime soon. We recommend you bring all nodes in all clusters up the > latest level 4.2.0.4 and run the "mmchconfig release=latest" and > "mmchfs -V full" commands that will ensure all daemon levels and fs > levels are at the necessary level that supports the 1502 RPC feature > level. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/07/2016 06:31 PM, Aaron Knister wrote: > Thanks Sander. That's disconcerting...yikes! Sorry for your trouble > but thank you for sharing. > > I'm surprised this didn't shake out during testing of gpfs 3.5 and > 4.1. I wonder if in light of this it's wise to do the clients first? > My logic being that there's clearly an example here of 4.1 servers > expecting behavior that only 4.1 clients provide. I suppose, though, > that there's just as likely a chance that there could be a yet to be > discovered bug in a situation where a 4.1 client expects something not > provided by a 3.5 server. Our current plan is still to take servers > first but I suspect we'll do a fair bit of testing with the PIT > commands in our test environment just out of curiosity. > > Also out of curiosity, how long ago did you open that PMR? I'm > wondering if there's a chance they've fixed this issue. I'm also > perplexed and cocnerned that there's no documentation of the PIT > commands to avoid during upgrades that I can find in any of the GPFS > upgrade documentation. > > -Aaron > > On 12/6/16 2:25 AM, Sander Kuusemets wrote: >> Hello Aaron, >> >> I thought I'd share my two cents, as I just went through the process. I >> thought I'd do the same, start upgrading from where I can and wait until >> machines come available. It took me around 5 weeks to complete the >> process, but the last two were because I was super careful. >> >> At first nothing happened, but at one point, a week into the upgrade >> cycle, when I tried to mess around (create, delete, test) a fileset, >> suddenly I got the weirdest of error messages while trying to delete a >> fileset for the third time from a client node - I sadly cannot exactly >> remember what it said, but I can describe what happened. >> >> After the error message, the current manager of our cluster fell into >> arbitrating state, it's metadata disks were put to down state, manager >> status was given to our other server node and it's log was spammed with >> a lot of error messages, something like this: >> >>> mmfsd: >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >>> >>> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >>> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >>> + 0)' failed. >>> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >>> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >>> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >>> (sizeof(Pad32) + 0)) in line 1411 of file >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >>> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >>> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >>> logAssertFailed + 0x2D6 at ??:0 >>> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >>> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >>> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >>> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >>> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >>> RcvWorker::RcvMain() + 0x107 at ??:0 >>> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >>> RcvWorker::thread(void*) + 0x5B at ??:0 >>> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >>> Thread::callBody(Thread*) + 0x46 at ??:0 >>> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >>> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >>> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >>> start_thread + 0xD1 at ??:0 >>> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >>> 0x6D at ??:0 >> After this I tried to put disks up again, which failed half-way through >> and did the same with the other server node (current master). So after >> this my cluster had effectively failed, because all the metadata disks >> were down and there was no path to the data disks. When I tried to put >> all the metadata disks up with one start command, then it worked on >> third try and the cluster got into working state again. Downtime about >> an hour. >> >> I created a PMR with this information and they said that it's a bug, but >> it's a tricky one so it's going to take a while, but during that it's >> not recommended to use any commands from this list: >> >>> Our apologies for the delayed response. Based on the debug data we >>> have and looking at the source code, we believe the assert is due to >>> incompatibility is arising from the feature level version for the >>> RPCs. In this case the culprit is the PIT "interesting inode" code. >>> >>> Several user commands employ PIT (Parallel Inode Traversal) code to >>> traverse each data block of every file: >>> >>>> >>>> mmdelfileset >>>> mmdelsnapshot >>>> mmdefragfs >>>> mmfileid >>>> mmrestripefs >>>> mmdeldisk >>>> mmrpldisk >>>> mmchdisk >>>> mmadddisk >>> The problematic one is the 'PitInodeListPacket' subrpc which is a part >>> of an "interesting inode" code change. Looking at the dumps its >>> evident that node 'node3' which sent the RPC is not capable of >>> supporting interesting inode (max feature level is 1340) and node >>> server11 which is receiving it is trying to interpret the RPC beyond >>> the valid region (as its feature level 1502 supports PIT interesting >>> inodes). >> >> And apparently any of the fileset commands either, as I failed with >> those. >> >> After I finished the upgrade, everything has been working wonderfully. >> But during this upgrade time I'd recommend to tread really carefully. >> >> Best regards, >> > From aaron.s.knister at nasa.gov Wed Dec 7 17:31:28 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 12:31:28 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: Thanks! I do have a question, though. Feature level 1340 I believe is equivalent to GPFS version 3.5.0.11. Feature level 1502 is GPFS 4.2 if I understand correctly. That suggests to me there are 3.5 and 4.2 nodes in the same cluster? Or at least 4.2 nodes in a cluster where the max feature level is 1340. I didn't think either of those are supported configurations? Am I missing something? -Aaron On 12/7/16 11:56 AM, Sander Kuusemets wrote: > It might have been some kind of a bug only we got, but I thought I'd > share, just in case. > > The email when they said they opened a ticket for this bug's fix was > quite exactly a month ago, so I doubt they've fixed it, as they said it > might take a while. > > I don't know if this is of any help, but a paragraph from the explanation: > >> The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine >> PIT_HelperGetWorkMH(). There are two RPC structures used in this routine >> - PitHelperWorkReport >> - PitInodeListPacket >> >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'stage3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> tank1 which is receiving it is trying to interpret the RPC beyond the >> valid region (as its feature level 1502 supports PIT interesting >> inodes). This is resulting in the assert you see. As a short term >> measure bringing all the nodes to the same feature level should make >> the problem go away. But since we support backward compatibility, we >> are opening an APAR to create a code fix. It's unfortunately going to >> be a tricky fix, which is going to take a significant amount of time. >> Therefore I don't expect the team will be able to provide an efix >> anytime soon. We recommend you bring all nodes in all clusters up the >> latest level 4.2.0.4 and run the "mmchconfig release=latest" and >> "mmchfs -V full" commands that will ensure all daemon levels and fs >> levels are at the necessary level that supports the 1502 RPC feature >> level. > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From carlz at us.ibm.com Wed Dec 7 17:47:52 2016 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 7 Dec 2016 12:47:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: Message-ID: We don't allow mixing of different licensing models (i.e. socket and capacity) within a single cluster*. As we worked through the implications, we realized it would be just too complicated to determine how to license any non-NSD nodes (management, CES, clients, etc.). In the socket model they are chargeable, in the capacity model they are not, and while we could have made up some rules, they would have added even more complexity to Scale licensing. This in turn is why we "grandfathered in" those customers already on Advanced Edition, so that they don't have to convert existing clusters to the new metric unless or until they want to. They can continue to buy Advanced Edition. The other thing we wanted to do with the capacity metric was to make the licensing more friendly to architectural best practices or design choices. So now you can have whatever management, gateway, etc. servers you need without paying for additional server licenses. In particular, client-only clusters cost nothing, and you don't have to keep track of clients if you have a virtual environment where clients come and go rapidly. I'm always happy to answer other questions about licensing. regards, Carl Zetie *OK, there is one exception involving future ESS models and existing clusters. If this is you, please have a conversation with your account team. Carl Zetie Program Director, OM for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 12/07/2016 09:59 AM Subject: gpfsug-discuss Digest, Vol 59, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (Felipe Knop) 2. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (David D. Johnson) 3. Re: Strategies - servers with local SAS disks (Simon Thompson (Research Computing - IT Services)) ---------------------------------------------------------------------- Message: 1 Date: Wed, 7 Dec 2016 09:37:15 -0500 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: Content-Type: text/plain; charset="us-ascii" All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/48aa0319/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 7 Dec 2016 09:47:46 -0500 From: "David D. Johnson" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38 at brown.edu> Content-Type: text/plain; charset="utf-8" Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html < http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html> > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/92819f21/attachment-0001.html > ------------------------------ Message: 3 Date: Wed, 7 Dec 2016 14:58:39 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: Content-Type: text/plain; charset="us-ascii" I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Cc: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/51c1a2ea/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 20 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Dec 8 13:33:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Dec 2016 13:33:40 +0000 Subject: [gpfsug-discuss] Flash Storage wiki entry incorrect Message-ID: To whom it may concern, I've just set up an LROC disk in one of my CES nodes and going from the example in: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage I used the following as a guide: cat lroc-stanza.txt %nsd: nsd=lroc-nsd1 device=/dev/faio server=gpfs-client1 <-- is not a NSD server, but client with Fusion i/o or SSD install as target for LROC usage=localCache The only problems are that 1) hyphens aren't allowed in NSD names and 2) the server parameter should be servers (plural). Once I worked that out I was good to go but perhaps someone could update the page with a (working) example? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Dec 8 19:27:08 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 8 Dec 2016 14:27:08 -0500 Subject: [gpfsug-discuss] GPFS fails to use VERBS RDMA because link is not up yet Message-ID: Under RHEL/CentOS 6, I had hacked an ?ibready? script for the SysV style init system that waits for link to come up on the infiniband port before allowing GPFS to start. Now that we?re moving to CentOS/RHEL 7.2, I need to reimplement this workaround for the fact that GPFS only tries once to start VERBS RDMA, and gives up if there is no link. I think it can be done by making a systemd unit that asks to run Before gpfs. Wondering if anyone has already done this to avoid reinventing the wheel?. Thanks, ? ddj Dave Johnson Brown University From r.sobey at imperial.ac.uk Fri Dec 9 11:52:12 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 11:52:12 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access Message-ID: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Fri Dec 9 13:21:12 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 9 Dec 2016 08:21:12 -0500 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: Message-ID: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone > On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: > > Hi all, > > Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). > > Cheers > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtolson at us.ibm.com Fri Dec 9 14:32:45 2016 From: jtolson at us.ibm.com (John T Olson) Date: Fri, 9 Dec 2016 07:32:45 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From billowen at us.ibm.com Fri Dec 9 15:44:28 2016 From: billowen at us.ibm.com (Bill Owen) Date: Fri, 9 Dec 2016 08:44:28 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Hi John, Nice paper! Regarding object auditing: - Does Varonis have an API that could be used to tell it when object operations complete from normal object interface? If so, a middleware module could be used to send interesting events to Varonis (this is already done in openstack auditing using CADF) - With Varonis, can you monitor operations just on ".data" files? (these are the real objects) Can you also include file metadata values in the logging of these operations? If so, the object url could be pulled whenever a .data file is created, renamed (delete), or read Thanks, Bill Owen billowen at us.ibm.com Spectrum Scale Object Storage 520-799-4829 From: John T Olson/Tucson/IBM at IBMUS To: gpfsug main discussion list Date: 12/09/2016 07:33 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:14:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:14:14 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: , <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Thanks Aaron. I will take a look on Moday. Now I think about it, I did something like this on the old Samba/CTDB cluster before we deployed CES, so it must be possible, just to what level IBM will support it. Have a great weekend, Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister Sent: 09 December 2016 13:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:15:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:15:03 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com>, Message-ID: Thanks John, As I said to Aaron I will also take a look at this on Monday. Regards Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John T Olson Sent: 09 December 2016 14:32 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp [https://s0.wp.com/i/blank.jpg] Samba: Logging User Activity moiristo.wordpress.com Ever wondered why Samba seems to log so many things, except what you're interested in? So did I, and it took me a while to find out that 1) there actually is a solution and 2) how to configur... I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From aaron.s.knister at nasa.gov Sat Dec 10 03:53:06 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 9 Dec 2016 22:53:06 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: <38d056ad-833f-1582-58fd-0e65a52ded6c@nasa.gov> Thanks Steve, that was exactly the answer I was looking for. On 12/6/16 8:20 AM, Steve Duersch wrote: > You fit within the "short time". The purpose of this remark is to make > it clear that this should not be a permanent stopping place. > Getting all nodes up to the same version is safer and allows for the use > of new features. > > > Steve Duersch > Spectrum Scale > 845-433-7902 > IBM Poughkeepsie, New York > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 5 Dec 2016 16:31:55 -0500 >> From: Aaron Knister >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question >> Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> >> Content-Type: text/plain; charset="utf-8"; format=flowed >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ >> com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> >> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 10 05:31:39 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 9 Dec 2016 21:31:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sat Dec 10 12:35:15 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 10 Dec 2016 07:35:15 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Eric! I have a few follow up questions for you-- Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? I'm curious to know what version of 4.1 you were at when you ran the mmchconfig. Would you mind sharing any log messages related to the errors you saw when you ran the mmchconfig? Thanks! Sent from my iPhone > On Dec 10, 2016, at 12:31 AM, Eric Horst wrote: > > >> On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. > > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Dec 11 15:07:09 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 11 Dec 2016 10:07:09 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I thought I'd share this with folks. I saw some log asserts in our test environment (~1050 client nodes and 12 manager/server nodes). I'm going from 3.5.0.31 (well, 2 clients are still at 3.5.0.19) -> 4.1.1.10. I've been running filebench in a loop for the past several days. It's sustaining about 60k write iops and about 15k read iops to the metadata disks for the filesystem I'm testing with, so I'd say it's getting pushed reasonably hard. The test cluster had 4.1 clients before it had 4.1 servers but after flipping 420 clients from 3.5.0.31 to 4.1.1.10 and starting up filebench I'm now seeing periodic logasserts from the manager/server nodes: Dec 11 08:57:39 loremds12 mmfs: Generic error in /project/sprelfks2/build/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C line 304 retCode 0, reasonCode 0 Dec 11 08:57:39 loremds12 mmfs: mmfsd: Error=MMFS_GENERIC, ID=0x30D9195E, Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 08:57:39 loremds12 mmfs: [X] logAssertFailed: (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: [X] return code 0, reason code 0, log record tag 0 Dec 11 08:57:42 loremds12 mmfs: [E] 10:0xA1BD5B RcvWorker::thread(void*).A1BD00 + 0x5B at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 11:0x622126 Thread::callBody(Thread*).6220E0 + 0x46 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 12:0x61220F Thread::callBodyWrapper(Thread*).612180 + 0x8F at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 13:0x7FF4E6BE66B6 start_thread + 0xE6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 14:0x7FF4E5FEE06D clone + 0x6D at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 2:0x9F95E9 logAssertFailed.9F9440 + 0x1A9 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 3:0x1232836 TokenClass::fixClientMode(Token*, int, int, int, CopysetRevoke*).1232350 + 0x4E6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 4:0x1235593 TokenClass::HandleTellRequest(RpcContext*, Request*, char**, int).1232AD0 + 0x2AC3 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 5:0x123A23C HandleTellRequestInterface(RpcContext*, Request*, char**, int).123A0D0 + 0x16C at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 6:0x125C6B0 queuedTellServer(RpcContext*, Request*, int, unsigned int).125C670 + 0x40 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 7:0x125EF72 tmHandleTellServer(RpcContext*, char*).125EEC0 + 0xB2 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 8:0xA12668 tscHandleMsg(RpcContext*, MsgDataBuf*).A120D0 + 0x598 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 9:0xA1BC4E RcvWorker::RcvMain().A1BB50 + 0xFE at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] *** Traceback: Dec 11 08:57:42 loremds12 mmfs: [N] Signal 6 at location 0x7FF4E5F456D5 in process 12188, link reg 0xFFFFFFFFFFFFFFFF. Dec 11 08:57:42 loremds12 mmfs: [X] *** Assert exp((!"downgrade to mode which is not StrictlyWeaker") node 584 old mode ro new mode (A: D: A) ) in line 304 of file /project/sprelfks2/bui ld/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C I've seen different messages on that third line of the "Tag=" message: Dec 11 00:16:40 loremds11 mmfs: Tag=5012168 node 825 old mode ro new mode 0x31 Dec 11 01:52:53 loremds10 mmfs: Tag=5016618 node 655 old mode ro new mode (A: MA D: ) Dec 11 02:15:57 loremds10 mmfs: Tag=5045549 node 994 old mode ro new mode (A: A D: A) Dec 11 08:14:22 loremds10 mmfs: Tag=5067054 node 237 old mode ro new mode 0x08 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 00:47:39 loremds09 mmfs: Tag=4998635 node 461 old mode ro new mode (A:R D: ) It's interesting to note that all of these node indexes are still running 3.5. I'm going to open up a PMR but thought I'd share the gory details here and see if folks had any insight. I'm starting to wonder if 4.1 clients are more tolerant of 3.5 servers than 4.1 servers are of 3.5 clients. -Aaron On 12/5/16 4:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a time >> without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sun Dec 11 21:28:39 2016 From: erich at uw.edu (Eric Horst) Date: Sun, 11 Dec 2016 13:28:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Sat, Dec 10, 2016 at 4:35 AM, Aaron Knister wrote: > Thanks Eric! > > I have a few follow up questions for you-- > > Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? > I'm curious to know what version of 4.1 you were at when you ran the > mmchconfig. > I went from 3.5.0-28 to 4.1.0-8 to 4.2.1-1. > > Would you mind sharing any log messages related to the errors you saw when > you ran the mmchconfig? > > Unfortunately I didn't save any actual logs from the update. I did the first cluster in early July so nothing remains. The only note I have is: "On update, after finalizing gpfs 4.1 the quota file format apparently changed and caused a mmrepquota hang/deadlock. Had to shutdown and restart the whole cluster." Sorry to not be very helpful on that front. -Eric > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different > clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end > mmchconfig release=LATEST. Minutes after flipping to latest the cluster > became non-responsive, with node mmfs panics and everything had to be > restarted. Logs indicated it was a quota problem. In 4.1 the quota files > move from externally visible files to internal hidden files. I suspect the > quota file transition can't be done without a cluster restart. When I did > the second cluster I upgraded all nodes and then very quickly stopped and > started the entire cluster, issuing the mmchconfig in the middle. No quota > panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end > mmchconfig release=LATEST. No cluster restart. Everything seemed to work > okay. Later, restarting a node I got weird fstab errors on gpfs startup and > using certain commands, notably mmfind, the command would fail with > something like "can't find /dev/uwfs" (our filesystem.) I restarted the > whole cluster and everything began working normally. In this case 4.2 got > rid of /dev/fsname. Just like in the quota case it seems that this > transition can't be seamless. Doing the second cluster I upgraded all nodes > and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very > complex product with a very easy upgrade procedure. I could imagine many > ways that an upgrade hop of two major versions in two weeks could go very > wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 12 13:55:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 12 Dec 2016 13:55:52 +0000 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? Message-ID: Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it will work, but I?m not sure if IBM would support it. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Dec 13 04:05:08 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 12 Dec 2016 23:05:08 -0500 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? In-Reply-To: References: Message-ID: Hi Bob, I have not, although I started to go down that path. I had wanted erasure coded pools but in order to front an erasure coded pool with an RBD volume you apparently need a cache tier? Seems that doesn't give one the performance they might want for this type of workload (http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#a-word-of-caution). If you're OK replicating the data I suspect it might work well. I did try sheepdog (https://sheepdog.github.io/sheepdog/) and that did work the way I wanted it to with erasure coding and gave me pretty good performance to boot. -Aaron On 12/12/16 8:55 AM, Oesterlin, Robert wrote: > Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it > will work, but I?m not sure if IBM would support it. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Thu Dec 15 13:13:43 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 15 Dec 2016 13:13:43 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Ah. I stopped reading when I read that the service account needs Domain Admin rights. I doubt that will fly unfortunately. Thanks though John. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John T Olson Sent: 09 December 2016 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister > To: gpfsug main discussion list > Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Mark.Bush at siriuscom.com Thu Dec 15 20:32:11 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 20:32:11 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 20:47:12 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 20:47:12 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Dec 15 20:52:17 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 15 Dec 2016 20:52:17 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu>, <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Dec 15 21:19:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 21:19:20 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Message-ID: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 21:25:21 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 21:25:21 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Message-ID: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Dec 17 04:24:34 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 16 Dec 2016 23:24:34 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name Message-ID: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Hi Everyone, I'm curious about the most straightforward and fastest way to identify what NSD a given /dev device is. The best I can come up with is "tspreparedisk -D device_name" which gives me something like: tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: that I can then parse and map the nsd id to the nsd name. I hesitate calling ts* commands directly and I admit it's perhaps an irrational fear, but I associate the -D flag with "delete" in my head and am afraid that some day -D may be just that and *poof* there go my NSD descriptors. Is there a cleaner way? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 17 04:55:00 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 16 Dec 2016 20:55:00 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: Perhaps this: mmlsnsd -m -Eric On Fri, Dec 16, 2016 at 8:24 PM, Aaron Knister wrote: > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational fear, > but I associate the -D flag with "delete" in my head and am afraid that > some day -D may be just that and *poof* there go my NSD descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Dec 17 07:04:08 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 17 Dec 2016 07:04:08 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Sat Dec 17 08:35:05 2016 From: jtucker at pixitmedia.com (Jez Tucker) Date: Sat, 17 Dec 2016 08:35:05 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> References: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> Message-ID: <6ebdd77b-c576-fbee-903c-c365e101cbb4@pixitmedia.com> Hi Aaron An alternative method for you is: from arcapix.fs.gpfs import Nsds >>> from arcapix.fs.gpfs import Nsds >>> nsd = Nsds() >>> for n in nsd.values(): ... print n.device, n.id ... /gpfsblock/mmfs1-md1 md3200_001_L000 /gpfsblock/mmfs1-md2 md3200_001_L001 /gpfsblock/mmfs1-data1 md3200_001_L002 /gpfsblock/mmfs1-data2 md3200_001_L003 /gpfsblock/mmfs1-data3 md3200_001_L004 /gpfsblock/mmfs2-md1 md3200_002_L000 Ref: http://arcapix.com/gpfsapi/nsds.html Obviously you can filter a specific device by the usual Pythonic string comparators. Jez On 17/12/16 07:04, Luis Bolinches wrote: > Hi > THe ts* is a good fear, they are internal commands bla bla bla you > know that > Have you tried mmlsnsd -X > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 > > "If you continually give you will continually have." Anonymous > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] translating /dev device into nsd name > Date: Sat, Dec 17, 2016 6:24 AM > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am > afraid > that some day -D may be just that and *poof* there go my NSD > descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* VP of Research and Development, ArcaStream jtucker at arcastream.com www.arcastream.com | Tw:@arcastream.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Sat Dec 17 21:42:39 2016 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sat, 17 Dec 2016 16:42:39 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: <54420.1482010959@turing-police.cc.vt.edu> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From daniel.kidger at uk.ibm.com Mon Dec 19 11:42:03 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 19 Dec 2016 11:42:03 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discussUnless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Mon Dec 19 14:53:27 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Mon, 19 Dec 2016 14:53:27 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance Message-ID: We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of the IO servers phoned home with memory error. IBM is coming out today to replace the faulty DIMM. What is the correct way of taking this system out for maintenance? Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When we needed to do maintenance on the old system, we would migrate manager role and also move primary and secondary server roles if one of those systems had to be taken down. With ESS and resource pool manager roles etc. is there a correct way of shutting down one of the IO serves for maintenance? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 19 15:15:45 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 19 Dec 2016 10:15:45 -0500 Subject: [gpfsug-discuss] Tiers In-Reply-To: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Message-ID: We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Mark, > > We just use an 8 Gb FC SAN. For the data pool we typically have a dual > active-active controller storage array fronting two big RAID 6 LUNs and 1 > RAID 1 (for /home). For the capacity pool, it might be the same exact > model of controller, but the two controllers are now fronting that whole > 60-bay array. > > But our users tend to have more modest performance needs than most? > > Kevin > > On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: > > Kevin, out of curiosity, what type of disk does your data pool use? SAS > or just some SAN attached system? > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Thursday, December 15, 2016 at 2:47 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Tiers > > Hi Mark, > > We?re a ?traditional? university HPC center with a very untraditional > policy on our scratch filesystem ? we don?t purge it and we sell quota > there. Ultimately, a lot of that disk space is taken up by stuff that, > let?s just say, isn?t exactly in active use. > > So what we?ve done, for example, is buy a 60-bay storage array and stuff > it with 8 TB drives. It wouldn?t offer good enough performance for > actively used files, but we use GPFS policies to migrate files to the > ?capacity? pool based on file atime. So we have 3 pools: > > 1. the system pool with metadata only (on SSDs) > 2. the data pool, which is where actively used files are stored and which > offers decent performance > 3. the capacity pool, for data which hasn?t been accessed ?recently?, and > which is on slower storage > > I would imagine others do similar things. HTHAL? > > Kevin > > > On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: > > Just curious how many of you out there deploy SS with various tiers? It > seems like a lot are doing the system pool with SSD?s but do you routinely > have clusters that have more than system pool and one more tier? > > I know if you are doing Archive in connection that?s an obvious choice for > another tier but I?m struggling with knowing why someone needs more than > two tiers really. > > I?ve read all the fine manuals as to how to do such a thing and some of > the marketing as to maybe why. I?m still scratching my head on this > though. In fact, my understanding is in the ESS there isn?t any different > pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). > > It does make sense to me know with TCT and I could create an ILM policy to > get some of my data into the cloud. > > But in the real world I would like to know what yall do in this regard. > > > Thanks > > Mark > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > *Sirius Computer Solutions * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 19 15:25:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 19 Dec 2016 15:25:52 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Mon Dec 19 15:30:58 2016 From: kenh at us.ibm.com (Ken Hill) Date: Mon, 19 Dec 2016 10:30:58 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" To: "gpfsug main discussion list" Cc: "gpfsug main discussion list" Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 15:36:50 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 15:36:50 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:40:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:40:50 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: References: Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E@vanderbilt.edu> Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 15:53:12 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 15:53:12 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Move its recoverygrops to the other node by putting the other node as primary server for it: mmchrecoverygroup rgname --servers otherServer,thisServer And verify that it's now active on the other node by "mmlsrecoverygroup rgname -L". Move away any filesystem managers or cluster manager role if that's active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. Then you can run mmshutdown on it (assuming you also have enough quorum nodes in the remaining cluster). -jf man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:58:16 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:58:16 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpappas at dstonline.com Mon Dec 19 15:59:12 2016 From: bpappas at dstonline.com (Bill Pappas) Date: Mon, 19 Dec 2016 15:59:12 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] SMB issues Message-ID: Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 16:06:08 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 16:06:08 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: We see it on all four of the nodes, and yet we did some getent passwd/getent group stuff on them to verify that identity is working OK. Simon From: > on behalf of Bill Pappas > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 19 December 2016 at 15:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SMB issues What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of gpfsug-discuss-request at spectrumscale.org > Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SMB issues Message-ID: > Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg URL: From ulmer at ulmer.org Mon Dec 19 16:16:56 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 11:16:56 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen > On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: > > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > >> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >> >> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >> >> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >> >> >> Ken Hill >> Technical Sales Specialist | Software Defined Solution Sales >> IBM Systems >> Phone:1-540-207-7270 >> E-mail: kenh at us.ibm.com >> >> >> 2300 Dulles Station Blvd >> Herndon, VA 20171-6133 >> United States >> >> >> >> >> >> >> >> >> >> >> From: "Daniel Kidger" > >> To: "gpfsug main discussion list" > >> Cc: "gpfsug main discussion list" > >> Date: 12/19/2016 06:42 AM >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Valdis wrote: >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> >> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >> >> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >> >> Daniel >> >> IBM Spectrum Storage Software >> +44 (0)7818 522266 >> Sent from my iPad using IBM Verse >> >> >> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >> >> From: Valdis.Kletnieks at vt.edu >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Date: 17 Dec 2016 21:43:00 >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> >> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >> > that I can then parse and map the nsd id to the nsd name. I hesitate >> > calling ts* commands directly and I admit it's perhaps an irrational >> > fear, but I associate the -D flag with "delete" in my head and am afraid >> > that some day -D may be just that and *poof* there go my NSD descriptors. >> Others have mentioned mmlsdnsd -m and -X >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 16:25:50 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 16:25:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: I normally do mmcrnsd without specifying any servers=, and point at the local /dev entry. Afterwards I add the servers= line and do mmchnsd. -jf man. 19. des. 2016 kl. 16.58 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those > NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen > server, but as long as you have the proper device name for the NSD from the > NSD server you want to be primary for it, I?ve never had a problem > specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > > On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: > > Indeed. It only matters when deploying NSDs. Post-deployment, all luns > (NSDs) are labeled - and they are assembled by GPFS. > > Keep in mind: If you are deploying multiple NSDs (with multiple servers) - > you'll need to pick one server to work with... Use that server to label the > luns (mmcrnsd)... In the nsd stanza file - the server you choose will need > to be the first server in the "servers" list. > > > *Ken Hill* > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > > ------------------------------ > *Phone:*1-540-207-7270 > * E-mail:* *kenh at us.ibm.com* > > > > > > > > > > > > > > > > > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > > > > > > From: "Daniel Kidger" > To: "gpfsug main discussion list" > > Cc: "gpfsug main discussion list" > > Date: 12/19/2016 06:42 AM > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > *Valdis wrote:* > > > > *Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across the > servers, or across reboots. And when multipath is involved, you may have 4 > or 8 or even more names for the same device....* > > Indeed the is whole greatness about NSDs (and in passing why Lustre can be > much more tricky to safely manage.) > Once a lun is "labelled" as an NSD then that NSD name is all you need to > care about as the /dev entries can now freely change on reboot or differ > across nodes. Indeed if you connect an arbitrary node to an NSD disk via a > SAN cable, gpfs will recognise it and use it as a shortcut to that lun. > > Finally recall that in the NSD stanza file the /dev entry is only matched > for on the first of the listed NSD servers; the other NSD servers will > discover and learn which NSD this is, ignoring the /dev value in this > stanza. > > Daniel > > IBM Spectrum Storage Software > *+44 (0)7818 522266* <+44%207818%20522266> > Sent from my iPad using IBM Verse > > > ------------------------------ > On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: > > From: Valdis.Kletnieks at vt.edu > To: gpfsug-discuss at spectrumscale.org > Cc: > Date: 17 Dec 2016 21:43:00 > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > > On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > > that I can then parse and map the nsd id to the nsd name. I hesitate > > calling ts* commands directly and I admit it's perhaps an irrational > > fear, but I associate the -D flag with "delete" in my head and am afraid > > that some day -D may be just that and *poof* there go my NSD descriptors. > Others have mentioned mmlsdnsd -m and -X > Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across > the servers, or across reboots. And when multipath is involved, you may > have 4 or 8 or even more names for the same device.... > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 16:43:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 16:43:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Message-ID: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Hi Stephen, Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: 1. go down to the data center and sit in front of the storage arrays. 2. log on to the NSD server I want to be primary for a given NSD. 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. 3. for the remaining disks, run ?dd if=/dev/> wrote: Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Dec 19 16:45:38 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 19 Dec 2016 16:45:38 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Can you create an export with "admin user" and see if the issue is reproducible that way: Mmsmb export add exportname /path/to/folder Mmsmb export change exportname -option "admin users=username at domain" And for good measure remove the SID of Domain Users from the ACL: mmsmb exportacl remove exportname --SID S-1-1-0 I can't quite think in my head how this will help but I'd be interested to know if you see similar behaviour. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 December 2016 15:37 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SMB issues Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Mon Dec 19 17:08:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 12:08:27 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <14903A9D-B051-4B1A-AF83-31140FC7666D@ulmer.org> Depending on the hardware?. ;) Sometimes you can use the drivers to tell you the ?volume name? of a LUN on the storage server. You could do that the DS{3,4,5}xx systems. I think you can also do it for Storwize-type systems, but I?m blocking on how and I don?t have one in front of me at the moment. Either that or use the volume UUID or some such. I?m basically never where I can see the blinky lights. :( -- Stephen > On Dec 19, 2016, at 11:43 AM, Buterbaugh, Kevin L > wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer > wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" > >>>> To: "gpfsug main discussion list" > >>>> Cc: "gpfsug main discussion list" > >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Dec 19 17:16:07 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 19 Dec 2016 09:16:07 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: We have each of our NSDs on boxes shared between two servers, with one server primary for each raid unit. When I create Logical drives and map them, I make sure there is no overlap in the logical unit numbers between the two boxes. Then I use /proc/partitions and lsscsi to see if they all show up. When it is time to write the stanza files, I use multipath -ll to get a list with the device name and LUN info, and sort it to round robin over all the NSD servers. It's still tedious, but it doesn't require a trip to the machine room. Note that the multipath -ll command needs to be run separately on each NSD server to get the device name specific to that host -- the first server name in the list. Also realize that leaving the host name off when creating NSDs only works if all the drives are visible from the node where you run the command. Regards, -- ddj Dave Johnson > On Dec 19, 2016, at 8:43 AM, Buterbaugh, Kevin L wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" >>>> To: "gpfsug main discussion list" >>>> Cc: "gpfsug main discussion list" >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Dec 19 17:31:38 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 19 Dec 2016 10:31:38 -0700 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tortay at cc.in2p3.fr Mon Dec 19 17:49:05 2016 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Mon, 19 Dec 2016 18:49:05 +0100 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <5cca2ea8-b098-c1e4-ab03-9542837287ab@cc.in2p3.fr> On 12/19/2016 05:43 PM, Buterbaugh, Kevin L wrote: > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > Hello, We use device mapper/multipath to assign meaningful names to devices based on the WWN (or the storage system "volume" name) of the LUNs. We use a simple naming scheme ("nsdDDNN", where DD is the primary server number and NN the NSD number for that node, of course all NSDs are served by at least 2 nodes). When possible, these names are also used by the storage systems (nowadays mostly LSI/Netapp units). We have scripts to automate the configuration of the LUNs on the storage systems with the proper names as well as for creating the relevant section of "multipath.conf". There is no ambiguity during "mmcrnsd" (or no need to use "mmchnsd" later on) and it's also easy to know which filesystem or pool is at risk when some hardware fails (CMDB, etc.) Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From mimarsh2 at vt.edu Tue Dec 20 13:57:31 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 08:57:31 -0500 Subject: [gpfsug-discuss] mmlsdisk performance impact Message-ID: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 14:03:07 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 14:03:07 +0000 Subject: [gpfsug-discuss] mmlsdisk performance impact In-Reply-To: References: Message-ID: Hi Brian, If I?m not mistaken, once you run the mmlsdisk command on one client any other client running it will produce the exact same output. Therefore, what we do is run it once, output that to a file, and propagate that file to any node that needs it. HTHAL? Kevin On Dec 20, 2016, at 7:57 AM, Brian Marshall > wrote: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 16:25:04 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 11:25:04 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process Message-ID: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 16:27:32 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 08:27:32 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: Message-ID: <20161220162732.GB20276@illiuin> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 16:35:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 10:35:44 -0600 Subject: [gpfsug-discuss] LROC Message-ID: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 16:37:54 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 16:37:54 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <20161220162732.GB20276@illiuin> References: <20161220162732.GB20276@illiuin> Message-ID: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 20 17:03:28 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 20 Dec 2016 17:03:28 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:07:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:07:17 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Brian, > > It would be helpful to know what scheduling software, if any, you use. > > We were a PBS / Moab shop for a number of years but switched to SLURM two > years ago. With both you can configure the maximum amount of memory > available to all jobs on a node. So we just simply ?reserve? however much > we need for GPFS and other ?system? processes. > > I can tell you that SLURM is *much* more efficient at killing processes as > soon as they exceed the amount of memory they?ve requested than PBS / Moab > ever dreamed of being. > > Kevin > > On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: > > We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > process memory > usage. In the GE exec host configuration, we reserve 4GB for the OS > (including GPFS) so jobs are not able to consume all the physical memory on > the system. > > On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 20 17:13:48 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 20 Dec 2016 17:13:48 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 17:15:02 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 17:15:02 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <818353BF-18AC-4931-8890-35D6ECC4DF04@vanderbilt.edu> Hi Brian, I don?t *think* you can entirely solve this problem with Moab ? as I mentioned, it?s not nearly as efficient as SLURM is at killing jobs when they exceed requested memory. We had situations where a user would be able to run a node out of memory before Moab would kill it. Hasn?t happened once with SLURM, AFAIK. But with either Moab or SLURM what we?ve done is taken the amount of physical RAM in the box and subtracted from that the amount of memory we want to ?reserve? for the system (OS, GPFS, etc.) and then told Moab / SLURM that this is how much RAM the box has. That way they at least won?t schedule jobs on the node that would exceed available memory. HTH? Kevin On Dec 20, 2016, at 11:07 AM, Brian Marshall > wrote: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" > wrote: Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:15:23 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:15:23 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: Skyrim equals Slurm. Mobile shenanigans. Brian On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > We use adaptive - Moab torque right now but are thinking about going to > Skyrim > > Brian > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > Kevin.Buterbaugh at vanderbilt.edu> wrote: > >> Hi Brian, >> >> It would be helpful to know what scheduling software, if any, you use. >> >> We were a PBS / Moab shop for a number of years but switched to SLURM two >> years ago. With both you can configure the maximum amount of memory >> available to all jobs on a node. So we just simply ?reserve? however much >> we need for GPFS and other ?system? processes. >> >> I can tell you that SLURM is *much* more efficient at killing processes >> as soon as they exceed the amount of memory they?ve requested than PBS / >> Moab ever dreamed of being. >> >> Kevin >> >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson >> wrote: >> >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user >> process memory >> usage. In the GE exec host configuration, we reserve 4GB for the OS >> (including GPFS) so jobs are not able to consume all the physical memory >> on >> the system. >> >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: >> >> All, >> >> What is your favorite method for stopping a user process from eating up >> all >> the system memory and saving 1 GB (or more) for the GPFS / system >> processes? We have always kicked around the idea of cgroups but never >> moved on it. >> >> The problem: A user launches a job which uses all the memory on a node, >> which causes the node to be expelled, which causes brief filesystem >> slowness everywhere. >> >> I bet this problem has already been solved and I am just googling the >> wrong >> search terms. >> >> >> Thanks, >> Brian >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> -- >> -- Skylar Thompson (skylar2 at u.washington.edu) >> -- Genome Sciences Department, System Administrator >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> >> -- University of Washington School of Medicine >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Dec 20 17:19:48 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 20 Dec 2016 17:19:48 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: For sake of everyone else on this listserv, I'll highlight the appropriate procedure here. It turns out, changing recovery group on an active system is not recommended by IBM. We tried following Jan's recommendation this morning, and the system became unresponsive for about 30 minutes. It only became responsive (and recovery group change finished) after we killed couple of processes (ssh and scp) going to couple of clients. I got a Sev. 1 with IBM opened and they tell me that appropriate steps for IO maintenance are as follows: 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) 2. unmount gpfs on io node that is going down 3. shutdown gpfs on io node that is going down 4. shutdown os That's it - recovery groups should not be changed. If there is a need to change recovery group, use --active option (not permanent change). We are now stuck in situation that io2 server is owner of both recovery groups. The way IBM tells us to fix this is to unmount the filesystem on all clients and change recovery groups then. We can't do it now and will have to schedule maintenance sometime in 2017. For now, we have switched recovery groups using --active flag and things (filesystem performance) seems to be OK. Load average on both io servers is quite high (250avg) and does not seem to be going down. I really wish that maintenance procedures were documented somewhere on IBM website. This experience this morning has really shaken my confidence in ESS. Damir On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust wrote: > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 17:18:35 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 09:18:35 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <20161220171834.GE20276@illiuin> When using m_mem_free on GE with cgroup=true, GE just depends on the kernel OOM killer. There's one killer per cgroup so when a job goes off the rails, only its processes are eligible for OOM killing. I'm not sure how Slurm does it but anything that uses cgroups should have the above behavior. On Tue, Dec 20, 2016 at 12:15:23PM -0500, Brian Marshall wrote: > Skyrim equals Slurm. Mobile shenanigans. > > Brian > > On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > > > We use adaptive - Moab torque right now but are thinking about going to > > Skyrim > > > > Brian > > > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > > Kevin.Buterbaugh at vanderbilt.edu> wrote: > > > >> Hi Brian, > >> > >> It would be helpful to know what scheduling software, if any, you use. > >> > >> We were a PBS / Moab shop for a number of years but switched to SLURM two > >> years ago. With both you can configure the maximum amount of memory > >> available to all jobs on a node. So we just simply ???reserve??? however much > >> we need for GPFS and other ???system??? processes. > >> > >> I can tell you that SLURM is *much* more efficient at killing processes > >> as soon as they exceed the amount of memory they???ve requested than PBS / > >> Moab ever dreamed of being. > >> > >> Kevin > >> > >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson > >> wrote: > >> > >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > >> process memory > >> usage. In the GE exec host configuration, we reserve 4GB for the OS > >> (including GPFS) so jobs are not able to consume all the physical memory > >> on > >> the system. > >> > >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > >> > >> All, > >> > >> What is your favorite method for stopping a user process from eating up > >> all > >> the system memory and saving 1 GB (or more) for the GPFS / system > >> processes? We have always kicked around the idea of cgroups but never > >> moved on it. > >> > >> The problem: A user launches a job which uses all the memory on a node, > >> which causes the node to be expelled, which causes brief filesystem > >> slowness everywhere. > >> > >> I bet this problem has already been solved and I am just googling the > >> wrong > >> search terms. > >> > >> > >> Thanks, > >> Brian > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> -- > >> -- Skylar Thompson (skylar2 at u.washington.edu) > >> -- Genome Sciences Department, System Administrator > >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > >> -- University of Washington School of Medicine > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> ??? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > >> Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 19:13:46 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 13:13:46 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more > metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Dec 20 19:18:47 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 20 Dec 2016 19:18:47 +0000 Subject: [gpfsug-discuss] LROC Message-ID: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSize configuration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Dec 20 19:36:08 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 20 Dec 2016 20:36:08 +0100 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: I'm sorry for your trouble, but those 4 steps you got from IBM support does not seem correct. IBM support might not always realize that it's an ESS, and not plain GPFS... If you take down an ESS IO-node without moving its RG to the other node using "--servers othernode,thisnode", or by using --active (which I've never used), you'll take down the whole recoverygroup and need to suffer an uncontrolled failover. Such an uncontrolled failover takes a few minutes of filesystem hang, while a controlled failover should not hang the system. I don't see why it's a problem that you now have an IO server that is owning both recoverygroups. Once your maintenance of the first IO servers is done, I would just revert the --servers order of that recovergroup, and it should move back. The procedure to move RGs around during IO node maintenance is documented on page 10 the quick deployment guide (step 1-3): http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc -jf On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic wrote: > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > >> >> Move its recoverygrops to the other node by putting the other node as >> primary server for it: >> >> mmchrecoverygroup rgname --servers otherServer,thisServer >> >> And verify that it's now active on the other node by "mmlsrecoverygroup >> rgname -L". >> >> Move away any filesystem managers or cluster manager role if that's >> active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. >> >> Then you can run mmshutdown on it (assuming you also have enough quorum >> nodes in the remaining cluster). >> >> >> -jf >> >> man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : >> >> We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of >> the IO servers phoned home with memory error. IBM is coming out today to >> replace the faulty DIMM. >> >> What is the correct way of taking this system out for maintenance? >> >> Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When >> we needed to do maintenance on the old system, we would migrate manager >> role and also move primary and secondary server roles if one of those >> systems had to be taken down. >> >> With ESS and resource pool manager roles etc. is there a correct way of >> shutting down one of the IO serves for maintenance? >> >> Thanks, >> Damir >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Dec 20 20:30:04 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 21:30:04 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 20 20:44:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 14:44:44 -0600 Subject: [gpfsug-discuss] CES ifs-ganashe Message-ID: Does ganashe have a default read and write max size? if so what is it? Thanks Matt From olaf.weiser at de.ibm.com Tue Dec 20 21:06:44 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 22:06:44 +0100 Subject: [gpfsug-discuss] CES ifs-ganashe In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Tue Dec 20 23:25:41 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 21 Dec 2016 08:25:41 +0900 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: I still see the following statement* regarding with the use of LROC in FAQ (URL #1). Are there any issues anticipated to use LROC on protocol nodes? Q8.3: What are some configuration considerations when deploying the protocol functionality? A8.3: Configuration considerations include: (... many lines are snipped ...) Several GPFS configuration aspects have not been explicitly tested with the protocol function: (... many lines are snipped ...) Local Read Only Cache* (... many lines are snipped ...) Q2.25: What are the current requirements when using local read-only cache? A2.25: The current requirements/limitations for using local read-only cache include: - A minimum of IBM Spectrum Scale V4.1.0.1. - Local read-only cache is only supported on Linux x86 and Power. - The minimum size of a local read-only cache device is 4 GB. - The local read-only cache requires memory equal to 1% of the local read-only device's capacity. Note: Use of local read-only cache does not require a server license [1] IBM Spectrum Scale? Frequently Asked Questions and Answers (November 2016) https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: "Olaf Weiser" To: gpfsug main discussion list Date: 2016/12/21 05:31 Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org it's all true and right, but please have in mind.. with MFTC and the number of nodes in the ( remote and local ) cluster, you 'll need token mem since R42 token Mem is allocated automatically .. so the old tokenMEMLimit is more or less obsolete.. but you should have your overall configuration in mind, when raising MFTC clusterwide... just a hint.. have fun... Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/20/2016 08:19 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSizeconfiguration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Dec 21 09:23:16 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 09:23:16 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil wrote: > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 21 09:42:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Dec 2016 09:42:36 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Ooh, LROC sensors for Zimon? must look into that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: 21 December 2016 09:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:29:04 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:29:04 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:37:46 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:37:46 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:48:24 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:48:24 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:57:39 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:57:39 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as > its the only place under Linux its used and should be set low on non-LROC > enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use > LROC on the same node. LROC uses the compact object (StatCache) to store > its pointer to the full file Object which is stored on the LROC device. so > on a call for attributes that are not in the StatCache the object gets > recalled from LROC and converted back into a full File Object, which is why > you still need to have a reasonable maxFiles setting even you use LROC as > you otherwise constantly move file infos in and out of LROC and put the > device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs p.childs at qmul.ac.uk>> wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org> < > gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:12:22 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:12:22 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jez.tucker at gpfsug.org Wed Dec 21 12:16:03 2016 From: jez.tucker at gpfsug.org (Jez Tucker) Date: Wed, 21 Dec 2016 12:16:03 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: > > Does anyone know when the presentations from the last users group > meeting will be posted. I checked last night but there doesn?t seem > to be any new ones out there (summaries of talks yet). > > Thanks > > Mark > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, > and exempt from disclosure under applicable law. If you are not the > intended recipient, you are hereby notified that any use, > dissemination, distribution, or copying of this communication is > strictly prohibited. This message may be viewed by parties at Sirius > Computer Solutions other than those named in the message header. This > message does not contain an official representation of Sirius Computer > Solutions. If you have received this communication in error, notify > Sirius Computer Solutions immediately and (i) destroy this message if > a facsimile or (ii) delete this message immediately if this is an > electronic communication. Thank you. > > Sirius Computer Solutions > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:24:34 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:24:34 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Message-ID: Yes From: Jez Tucker Reply-To: "jez.tucker at gpfsug.org" , gpfsug main discussion list Date: Wednesday, December 21, 2016 at 6:16 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Presentations from last UG Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Wed Dec 21 12:46:42 2016 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Wed, 21 Dec 2016 12:46:42 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: Checking... Kristy On Dec 21, 2016 7:12 AM, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 21 13:42:02 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 21 Dec 2016 13:42:02 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 14:37:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 14:37:58 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Thanks much, Simon. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Wednesday, December 21, 2016 at 7:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Presentations from last UG Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Dec 21 15:17:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 21 Dec 2016 10:17:27 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen > On Dec 21, 2016, at 6:57 AM, Sven Oehme > wrote: > > its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). > on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . > > Sven > > On Wed, Dec 21, 2016 at 12:48 PM Peter Childs > wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs >> wrote: > My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > >> on behalf of Sven Oehme >> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. > therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >>>> wrote: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Dec 21 15:39:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 21 Dec 2016 16:39:16 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: close, but not 100% :-) LROC only needs a StatCache Object for files that don't have a Full OpenFile (maxFilestoCache) Object and you still want to be able to hold Metadata and/or Data in LROC. e.g. you can have a OpenFile instance that has Data blocks in LROC, but no Metadata (as everything is in the OpenFile Object itself), then you don't need a maxStatCache Object for this one. but you would need a StatCache object if we have to throw this file metadata or data out of the FileCache and/or Pagepool as we would otherwise loose all references to that file in LROC. the MaxStat Object is the most compact form to hold only references to the real data. if its still unclear we might have to do a small writeup in form of a paper with a diagram to better explain it, but that would take a while due to a lot of other work ahead of that :-) sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Stephen Ulmer To: gpfsug main discussion list Date: 12/21/2016 04:17 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen On Dec 21, 2016, at 6:57 AM, Sven Oehme wrote: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < oehmes at gmail.com> Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Flash%20Storage < https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Wed Dec 21 16:03:44 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 21 Dec 2016 16:03:44 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Hi Jan, I am sorry if my post sounded accusatory - I did not mean it that way. We had a very frustrating experience trying to change recoverygroup yesterday morning. I've read the manual you have linked and indeed, you have outlined the correct procedure. I am left wondering why the level 2 gpfs support instructed us not to do that in the future. Their support instructions are contradicting what's in the manual. We are running now with the --active recovery group in place and will change it permanently back to the default setting early in the new year. Anyway, thanks for your help. Damir On Tue, Dec 20, 2016 at 1:36 PM Jan-Frode Myklebust wrote: > I'm sorry for your trouble, but those 4 steps you got from IBM support > does not seem correct. IBM support might not always realize that it's an > ESS, and not plain GPFS... If you take down an ESS IO-node without moving > its RG to the other node using "--servers othernode,thisnode", or by using > --active (which I've never used), you'll take down the whole recoverygroup > and need to suffer an uncontrolled failover. Such an uncontrolled failover > takes a few minutes of filesystem hang, while a controlled failover should > not hang the system. > > I don't see why it's a problem that you now have an IO server that is > owning both recoverygroups. Once your maintenance of the first IO servers > is done, I would just revert the --servers order of that recovergroup, and > it should move back. > > The procedure to move RGs around during IO node maintenance is documented > on page 10 the quick deployment guide (step 1-3): > > > http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc > > > -jf > > > On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic > wrote: > > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Dec 21 21:55:51 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 21 Dec 2016 21:55:51 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down formaintenance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 16:44:26 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 10:44:26 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: This is enabled on this node but mmdiag it does not seem to show it caching. Did I miss something? I do have one file system in the cluster that is running 3.5.0.7 wondering if that is causing this. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): 'NULL' status Idle > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Tue Dec 27 11:21:14 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) From aaron.s.knister at nasa.gov Wed Dec 28 17:50:35 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 12:50:35 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Hey Matt, We ran into a similar thing and if I recall correctly a mmchconfig --release=LATEST was required to get LROC working which, of course, would boot your 3.5.0.7 client from the cluster. -Aaron On 12/28/16 11:44 AM, Matt Weil wrote: > This is enabled on this node but mmdiag it does not seem to show it > caching. Did I miss something? I do have one file system in the > cluster that is running 3.5.0.7 wondering if that is causing this. >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): 'NULL' status Idle >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Tue Dec 27 11:21:14 2016 >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Wed Dec 28 18:02:27 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 12:02:27 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Message-ID: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From oehmes at us.ibm.com Wed Dec 28 19:06:19 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 20:06:19 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Wed Dec 28 19:52:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:52:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> k got that fixed now shows as status shutdown > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): > '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' > status Shutdown > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Wed Dec 28 19:55:18 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 19:55:18 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> Message-ID: Did you restart the daemon on that node after you fixed it ? Sent from IBM Verse Matt Weil --- Re: [gpfsug-discuss] LROC --- From:"Matt Weil" To:gpfsug-discuss at spectrumscale.orgDate:Wed, Dec 28, 2016 8:52 PMSubject:Re: [gpfsug-discuss] LROC k got that fixed now shows as status shutdown [root at ces1 ~]# mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' status Shutdown Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile 1073741824 Max capacity: 0 MB, currently in use: 0 MB Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Matt Weil ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 19:57:18 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:57:18 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> no I will do that next. On 12/28/16 1:55 PM, Sven Oehme wrote: > Did you restart the daemon on that node after you fixed it ? Sent from > IBM Verse > > Matt Weil --- Re: [gpfsug-discuss] LROC --- > > From: "Matt Weil" > To: gpfsug-discuss at spectrumscale.org > Date: Wed, Dec 28, 2016 8:52 PM > Subject: Re: [gpfsug-discuss] LROC > > ------------------------------------------------------------------------ > > k got that fixed now shows as status shutdown > >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): >> '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' >> status Shutdown >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Wed Dec 28 13:49:27 2016 > > > > On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 20:15:14 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 14:15:14 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> References: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> Message-ID: <5127934a-b6b6-c542-f50a-67c47fe6d6db@wustl.edu> still in a 'status Shutdown' even after gpfs was stopped and started. From aaron.s.knister at nasa.gov Wed Dec 28 22:16:00 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:16:00 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Anything interesting in the mmfs log? On a related note I'm curious how a 3.5 client is able to join a cluster with a minreleaselevel of 4.1.1.0. From: Matt Weil Sent: 12/28/16, 3:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC still in a 'status Shutdown' even after gpfs was stopped and started. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:21:21 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:21:21 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Message-ID: <59fa3ab8-a666-d29c-117d-9db515f566e8@wustl.edu> yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version From aaron.s.knister at nasa.gov Wed Dec 28 22:26:46 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:26:46 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Ouch...to quote Adam Savage "well there's yer problem". Are you perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like there was an LROC related assert fixed in 4.1.1.9 but I can't find details on it. From: Matt Weil Sent: 12/28/16, 5:21 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:39:19 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:39:19 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Message-ID: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> > mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.1.2 ". > Built on Oct 27 2016 at 10:52:12 > Running 13 minutes 54 secs, pid 13229 On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > Ouch...to quote Adam Savage "well there's yer problem". Are you > perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like > there was an LROC related assert fixed in 4.1.1.9 but I can't find > details on it. > > > > *From:*Matt Weil > *Sent:* 12/28/16, 5:21 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] LROC > > yes > > > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > > ssdActive) in line 427 of file > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > > logAssertFailed + 0x2D5 at ??:0 > > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > > fs_config_ssds(fs_config*) + 0x867 at ??:0 > > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > > SFSConfigLROC() + 0x189 at ??:0 > > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > > runTSControl(int, int, char**) + 0x80E at ??:0 > > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > > HandleCmdMsg(void*) + 0x1216 at ??:0 > > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > > Thread::callBody(Thread*) + 0x1E2 at ??:0 > > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > > start_thread + 0xC5 at ??:0 > > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > > 0x6D at ??:0 > > mmfsd: > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > > failed. > > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > > 0x00007FF15FD71000 > > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > > 0x0000000000000006 > > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > > 0x00007FF15E8D03A8 > > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > > 0x000000000001E9A1 > > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > > 0xFF092D63646B6860 > > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > > 0x0000000000000202 > > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > > 0x00007FF161032EC0 > > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > > 0x0000000000000000 > > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > > 0x0000000000000202 > > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > > 0x0000000000000000 > > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > > 0x0000000010017807 > > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > > at ??:0 > > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > > __assert_fail_base + 126 at ??:0 > > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > > __GI___assert_fail + 42 at ??:0 > > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > > 2F9 at ??:0 > > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > > fs_config_ssds(fs_config*) + 867 at ??:0 > > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > > 189 at ??:0 > > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 771 at ??:0 > > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > > int, char**) + 80E at ??:0 > > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > > HandleCmdMsg(void*) + 1216 at ??:0 > > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > > Thread::callBody(Thread*) + 1E2 at ??:0 > > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > > C5 at ??:0 > > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D > at ??:0 > > > On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > > related note I'm curious how a 3.5 client is able to join a cluster > > with a minreleaselevel of 4.1.1.0. > I was referring to the fs version not the gpfs client version sorry for > that confusion > -V 13.23 (3.5.0.7) File system version > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 28 23:19:52 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 18:19:52 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> Message-ID: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Interesting. Would you be willing to post the output of "mmlssnsd -X | grep 0A6403AA58641546" from the troublesome node as suggested by Sven? On 12/28/16 5:39 PM, Matt Weil wrote: > >> mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.1.2 ". >> Built on Oct 27 2016 at 10:52:12 >> Running 13 minutes 54 secs, pid 13229 > > On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: >> Ouch...to quote Adam Savage "well there's yer problem". Are you >> perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like >> there was an LROC related assert fixed in 4.1.1.9 but I can't find >> details on it. >> >> >> >> *From:*Matt Weil >> *Sent:* 12/28/16, 5:21 PM >> *To:* gpfsug main discussion list >> *Subject:* Re: [gpfsug-discuss] LROC >> >> yes >> >> > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != >> > ssdActive) in line 427 of file >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C >> > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: >> > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 >> > logAssertFailed + 0x2D5 at ??:0 >> > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 >> > fs_config_ssds(fs_config*) + 0x867 at ??:0 >> > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 >> > SFSConfigLROC() + 0x189 at ??:0 >> > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 >> > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 >> > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E >> > runTSControl(int, int, char**) + 0x80E at ??:0 >> > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 >> > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 >> > HandleCmdMsg(void*) + 0x1216 at ??:0 >> > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 >> > Thread::callBody(Thread*) + 0x1E2 at ??:0 >> > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 >> > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 >> > start_thread + 0xC5 at ??:0 >> > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + >> > 0x6D at ??:0 >> > mmfsd: >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: >> > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' >> > failed. >> > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 >> > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. >> > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx >> > 0x00007FF15FD71000 >> > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx >> > 0x0000000000000006 >> > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp >> > 0x00007FF15E8D03A8 >> > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi >> > 0x000000000001E9A1 >> > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 >> > 0xFF092D63646B6860 >> > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 >> > 0x00007FF161032EC0 >> > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk >> > 0x0000000010017807 >> > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 >> > Wed Dec 28 16:17:09.022 2016: [D] Traceback: >> > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 >> > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 >> > at ??:0 >> > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 >> > __assert_fail_base + 126 at ??:0 >> > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 >> > __GI___assert_fail + 42 at ??:0 >> > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + >> > 2F9 at ??:0 >> > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 >> > fs_config_ssds(fs_config*) + 867 at ??:0 >> > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + >> > 189 at ??:0 >> > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 >> > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 771 at ??:0 >> > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, >> > int, char**) + 80E at ??:0 >> > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 >> > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 >> > HandleCmdMsg(void*) + 1216 at ??:0 >> > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 >> > Thread::callBody(Thread*) + 1E2 at ??:0 >> > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 >> > Thread::callBodyWrapper(Thread*) + A2 at ??:0 >> > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + >> > C5 at ??:0 >> > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D >> at ??:0 >> >> >> On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE >> CORP] wrote: >> > related note I'm curious how a 3.5 client is able to join a cluster >> > with a minreleaselevel of 4.1.1.0. >> I was referring to the fs version not the gpfs client version sorry for >> that confusion >> -V 13.23 (3.5.0.7) File system version >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Thu Dec 29 15:57:40 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 09:57:40 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: > ro_cache_S29GNYAH200016 0A6403AA586531E1 > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > dmm ces1.gsc.wustl.edu server node On 12/28/16 5:19 PM, Aaron Knister wrote: > mmlssnsd -X | grep 0A6403AA58641546 From aaron.s.knister at nasa.gov Thu Dec 29 16:02:44 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 29 Dec 2016 11:02:44 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. That's a *really* long device path (and nested too), I wonder if that's causing issues. What does a "tspreparedisk -S" show on that node? Also, what does your nsddevices script look like? I'm wondering if you could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" paths if that would help things here. -Aaron On 12/29/16 10:57 AM, Matt Weil wrote: > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> dmm ces1.gsc.wustl.edu server node > > > On 12/28/16 5:19 PM, Aaron Knister wrote: >> mmlssnsd -X | grep 0A6403AA58641546 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Thu Dec 29 16:09:58 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 16:09:58 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: i agree that is a very long name , given this is a nvme device it should show up as /dev/nvmeXYZ i suggest to report exactly that in nsddevices and retry. i vaguely remember we have some fixed length device name limitation , but i don't remember what the length is, so this would be my first guess too that the long name is causing trouble. On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:10:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:10:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:02 AM, Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if > that's causing issues. was thinking of trying just /dev/sdxx > > What does a "tspreparedisk -S" show on that node? tspreparedisk:0::::0:0:: > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" paths if that would help things here. > if [[ $osName = Linux ]] > then > : # Add function to discover disks in the Linux environment. > for luns in `ls /dev/disk/by-id | grep nvme` > do > all_luns=disk/by-id/$luns > echo $all_luns dmm > done > > fi > I will try that. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: >> >> >>> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >>> dmm ces1.gsc.wustl.edu server node >> >> >> On 12/28/16 5:19 PM, Aaron Knister wrote: >>> mmlssnsd -X | grep 0A6403AA58641546 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From mweil at wustl.edu Thu Dec 29 16:18:30 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:18:30 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:09 AM, Sven Oehme wrote: > i agree that is a very long name , given this is a nvme device it > should show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , > but i don't remember what the length is, so this would be my first > guess too that the long name is causing trouble. I will try that. I was attempting to not need to write a custom udev rule for those. Also to keep the names persistent. Rhel 7 has a default rule that makes a sym link in /dev/disk/by-id. 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> ../../nvme0n1 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> ../../nvme1n1 > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws > here. > > That's a *really* long device path (and nested too), I wonder if > that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu > server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:28:32 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:28:32 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> wow that was it. > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 It is not caching however. I will restart gpfs to see if that makes it start working. On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: >> i agree that is a very long name , given this is a nvme device it >> should show up as /dev/nvmeXYZ >> i suggest to report exactly that in nsddevices and retry. >> i vaguely remember we have some fixed length device name limitation , >> but i don't remember what the length is, so this would be my first >> guess too that the long name is causing trouble. > I will try that. I was attempting to not need to write a custom udev > rule for those. Also to keep the names persistent. Rhel 7 has a > default rule that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 >> >> >> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >> > wrote: >> >> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws >> here. >> >> That's a *really* long device path (and nested too), I wonder if >> that's >> causing issues. >> >> What does a "tspreparedisk -S" show on that node? >> >> Also, what does your nsddevices script look like? I'm wondering >> if you >> could have it give back "/dev/dm-XXX" paths instead of >> "/dev/disk/by-id" >> paths if that would help things here. >> >> -Aaron >> >> On 12/29/16 10:57 AM, Matt Weil wrote: >> > >> > >> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> >> >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> >> dmm ces1.gsc.wustl.edu >> server node >> > >> > >> > On 12/28/16 5:19 PM, Aaron Knister wrote: >> >> mmlssnsd -X | grep 0A6403AA58641546 >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:41:38 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:41:38 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> Message-ID: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> after restart. still doesn't seem to be in use. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:35:32 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > >> mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:08:58 2016 > It is not caching however. I will restart gpfs to see if that makes > it start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: >> >> >> >> On 12/29/16 10:09 AM, Sven Oehme wrote: >>> i agree that is a very long name , given this is a nvme device it >>> should show up as /dev/nvmeXYZ >>> i suggest to report exactly that in nsddevices and retry. >>> i vaguely remember we have some fixed length device name limitation >>> , but i don't remember what the length is, so this would be my first >>> guess too that the long name is causing trouble. >> I will try that. I was attempting to not need to write a custom udev >> rule for those. Also to keep the names persistent. Rhel 7 has a >> default rule that makes a sym link in /dev/disk/by-id. >> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> >> ../../nvme0n1 >> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> >> ../../nvme1n1 >>> >>> >>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>> > wrote: >>> >>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>> straws here. >>> >>> That's a *really* long device path (and nested too), I wonder if >>> that's >>> causing issues. >>> >>> What does a "tspreparedisk -S" show on that node? >>> >>> Also, what does your nsddevices script look like? I'm wondering >>> if you >>> could have it give back "/dev/dm-XXX" paths instead of >>> "/dev/disk/by-id" >>> paths if that would help things here. >>> >>> -Aaron >>> >>> On 12/29/16 10:57 AM, Matt Weil wrote: >>> > >>> > >>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> >> >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >> dmm ces1.gsc.wustl.edu >>> server node >>> > >>> > >>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>> >> mmlssnsd -X | grep 0A6403AA58641546 >>> > >>> > _______________________________________________ >>> > gpfsug-discuss mailing list >>> > gpfsug-discuss at spectrumscale.org >>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> > >>> >>> -- >>> Aaron Knister >>> NASA Center for Climate Simulation (Code 606.2) >>> Goddard Space Flight Center >>> (301) 286-2776 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Dec 29 17:06:40 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 17:06:40 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: first good that the problem at least is solved, it would be great if you could open a PMR so this gets properly fixed, the daemon shouldn't segfault, but rather print a message that the device is too big. on the caching , it only gets used when you run out of pagepool or when you run out of full file objects . so what benchmark, test did you run to push data into LROC ? sven On Thu, Dec 29, 2016 at 5:41 PM Matt Weil wrote: > after restart. still doesn't seem to be in use. > > [root at ces1 ~]# mmdiag --lroc > > > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > > Statistics from: Thu Dec 29 10:35:32 2016 > > > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) > > > On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 > > It is not caching however. I will restart gpfs to see if that makes it > start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: > > i agree that is a very long name , given this is a nvme device it should > show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , but > i don't remember what the length is, so this would be my first guess too > that the long name is causing trouble. > > I will try that. I was attempting to not need to write a custom udev rule > for those. Also to keep the names persistent. Rhel 7 has a default rule > that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 > > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 <%28301%29%20286-2776> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 17:23:11 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 11:23:11 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> -k thanks all I see it using the lroc now. On 12/29/16 11:06 AM, Sven Oehme wrote: > first good that the problem at least is solved, it would be great if > you could open a PMR so this gets properly fixed, the daemon shouldn't > segfault, but rather print a message that the device is too big. > > on the caching , it only gets used when you run out of pagepool or > when you run out of full file objects . so what benchmark, test did > you run to push data into LROC ? > > sven > > > On Thu, Dec 29, 2016 at 5:41 PM Matt Weil > wrote: > > after restart. still doesn't seem to be in use. > >> [root at ces1 ~]# mmdiag --lroc >> >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:35:32 2016 >> >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > On 12/29/16 10:28 AM, Matt Weil wrote: >> >> wow that was it. >> >>> mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 >>> stubFile 1073741824 >>> Max capacity: 1526184 MB, currently in use: 0 MB >>> Statistics from: Thu Dec 29 10:08:58 2016 >> It is not caching however. I will restart gpfs to see if that >> makes it start working. >> >> On 12/29/16 10:18 AM, Matt Weil wrote: >>> >>> >>> >>> On 12/29/16 10:09 AM, Sven Oehme wrote: >>>> i agree that is a very long name , given this is a nvme device >>>> it should show up as /dev/nvmeXYZ >>>> i suggest to report exactly that in nsddevices and retry. >>>> i vaguely remember we have some fixed length device name >>>> limitation , but i don't remember what the length is, so this >>>> would be my first guess too that the long name is causing trouble. >>> I will try that. I was attempting to not need to write a custom >>> udev rule for those. Also to keep the names persistent. Rhel 7 >>> has a default rule that makes a sym link in /dev/disk/by-id. >>> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> -> ../../nvme0n1 >>> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 >>> -> ../../nvme1n1 >>>> >>>> >>>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>>> > wrote: >>>> >>>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>>> straws here. >>>> >>>> That's a *really* long device path (and nested too), I >>>> wonder if that's >>>> causing issues. >>>> >>>> What does a "tspreparedisk -S" show on that node? >>>> >>>> Also, what does your nsddevices script look like? I'm >>>> wondering if you >>>> could have it give back "/dev/dm-XXX" paths instead of >>>> "/dev/disk/by-id" >>>> paths if that would help things here. >>>> >>>> -Aaron >>>> >>>> On 12/29/16 10:57 AM, Matt Weil wrote: >>>> > >>>> > >>>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>>> >> >>>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>>> >> dmm ces1.gsc.wustl.edu >>>> server node >>>> > >>>> > >>>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>>> >> mmlssnsd -X | grep 0A6403AA58641546 >>>> > >>>> > _______________________________________________ >>>> > gpfsug-discuss mailing list >>>> > gpfsug-discuss at spectrumscale.org >>>> >>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> > >>>> >>>> -- >>>> Aaron Knister >>>> NASA Center for Climate Simulation (Code 606.2) >>>> Goddard Space Flight Center >>>> (301) 286-2776 >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Sat Dec 31 20:05:35 2016 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Sat, 31 Dec 2016 15:05:35 -0500 Subject: [gpfsug-discuss] Save the Date April 4-5 2017 Users Group Meeting at NERSC Message-ID: Hello all and happy new year (depending upon where you are right now :-) ). We'll have more details in 2017, but for now please save the date for a two-day users group meeting at NERSC in Berkeley, California. April 4-5, 2017 National Energy Research Scientific Computing Center (nersc.gov) Berkeley, California We look forward to offering our first two-day event in the US. Best, Kristy & Bob From ulmer at ulmer.org Thu Dec 1 02:45:48 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 21:45:48 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen > On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > > Options I?m considering: > > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > > Comments or other ideas welcome. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Dec 1 03:55:38 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 30 Nov 2016 22:55:38 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: Hello Stephen, There are three licensing models for Spectrum Scale | GPFS: Server FPO Client I think the thing you might be missing is the associated cost per function. Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: Stephen Ulmer To: gpfsug main discussion list Date: 11/30/2016 09:46 PM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: Bob, If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Date: Thu, Dec 1, 2016 12:34 AM Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From zgiles at gmail.com Thu Dec 1 03:59:40 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 22:59:40 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke wrote: > I have once set up a small system with just a few SSDs in two NSD servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 04:03:52 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 23:03:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: <7F68B673-EA06-4E99-BE51-B76C06FE416E@ulmer.org> The licensing model was my last point ? if the OP uses FPO just to create data resiliency they increase their cost (or curtail their access). I was really asking if there was a real, technical positive for using FPO in this example, as I could only come up with equivalences and negatives. -- Stephen > On Nov 30, 2016, at 10:55 PM, Ken Hill wrote: > > Hello Stephen, > > There are three licensing models for Spectrum Scale | GPFS: > > Server > FPO > Client > > I think the thing you might be missing is the associated cost per function. > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > From: Stephen Ulmer > To: gpfsug main discussion list > Date: 11/30/2016 09:46 PM > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > I don?t understand what FPO provides here that mirroring doesn?t: > You can still use failure domains ? one for each node. > Both still have redundancy for the data; you can lose a disk or a node. > The data has to be re-striped in the event of a disk failure ? no matter what. > > Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. > > What am I missing? > > Liberty, > > -- > Stephen > > > > On Nov 30, 2016, at 3:51 PM, Andrew Beattie > wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > Options I?m considering: > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > Comments or other ideas welcome. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Dec 1 04:15:17 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 30 Nov 2016 23:15:17 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zgiles at gmail.com Thu Dec 1 04:27:27 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 23:27:27 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton > of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet > available, but if one scours the interwebs they can find mention of > something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/at > tachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x- > gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog ( > https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a > similar situation. It's perhaps at a very high conceptually level similar > to Mestor. You erasure code your data across the nodes w/ the SAS disks and > then present those block devices to your NSD servers. I proved it could > work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then > replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > >> Just remember that replication protects against data availability, not >> integrity. GPFS still requires the underlying block device to return >> good data. >> >> If you're using it on plain disks (SAS or SSD), and the drive returns >> corrupt data, GPFS won't know any better and just deliver it to the >> client. Further, if you do a partial read followed by a write, both >> replicas could be destroyed. There's also no efficient way to force use >> of a second replica if you realize the first is bad, short of taking the >> first entirely offline. In that case while migrating data, there's no >> good way to prevent read-rewrite of other corrupt data on your drive >> that has the "good copy" while restriping off a faulty drive. >> >> Ideally RAID would have a goal of only returning data that passed the >> RAID algorithm, so shouldn't be corrupt, or made good by recreating from >> parity. However, as we all know RAID controllers are definitely prone to >> failures as well for many reasons, but at least a drive can go bad in >> various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) >> without (hopefully) silent corruption.. >> >> Just something to think about while considering replication .. >> >> >> >> On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > > wrote: >> >> I have once set up a small system with just a few SSDs in two NSD >> servers, >> providin a scratch file system in a computing cluster. >> No RAID, two replica. >> works, as long the admins do not do silly things (like rebooting >> servers >> in sequence without checking for disks being up in between). >> Going for RAIDs without GPFS replication protects you against single >> disk >> failures, but you're lost if just one of your NSD servers goes off. >> >> FPO makes sense only sense IMHO if your NSD servers are also >> processing >> the data (and then you need to control that somehow). >> >> Other ideas? what else can you do with GPFS and local disks than >> what you >> considered? I suppose nothing reasonable ... >> >> >> Mit freundlichen Gr??en / Kind regards >> >> >> Dr. Uwe Falke >> >> IT Specialist >> High Performance Computing Services / Integrated Technology Services / >> Data Center Services >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland >> Rathausstr. 7 >> 09111 Chemnitz >> Phone: +49 371 6978 2165 >> Mobile: +49 175 575 2877 >> E-Mail: uwefalke at de.ibm.com >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland Business & Technology Services GmbH / >> Gesch?ftsf?hrung: >> Frank Hammer, Thorsten Moehring >> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht >> Stuttgart, >> HRB 17122 >> >> >> >> >> From: "Oesterlin, Robert" > > >> To: gpfsug main discussion list >> > > >> Date: 11/30/2016 03:34 PM >> Subject: [gpfsug-discuss] Strategies - servers with local SAS >> disks >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> >> Looking for feedback/strategies in setting up several GPFS servers >> with >> local SAS. They would all be part of the same file system. The >> systems are >> all similar in configuration - 70 4TB drives. >> >> Options I?m considering: >> >> - Create RAID arrays of the disks on each server (worried about the >> RAID >> rebuild time when a drive fails with 4, 6, 8TB drives) >> - No RAID with 2 replicas, single drive per NSD. When a drive fails, >> recreate the NSD ? but then I need to fix up the data replication via >> restripe >> - FPO ? with multiple failure groups - letting the system manage >> replica >> placement and then have GPFS due the restripe on disk failure >> automatically >> >> Comments or other ideas welcome. >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> 507-269-0413 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> -- >> Zach Giles >> zgiles at gmail.com >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 12:47:43 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 12:47:43 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 13:13:31 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 1 Dec 2016 08:13:31 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: > on behalf of Zachary Giles > > Reply-To: gpfsug main discussion list > > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/ ) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > >> wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > > Mobile: +49 175 575 2877 > > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > >> > To: gpfsug main discussion list > > >> > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > -- > Zach Giles > zgiles at gmail.com > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 13:20:46 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 13:20:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Yep, I should have added those requirements :-) 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. 2) Yes, it will be backed up 3) I expect it to grow over time 4) Data integrity requirement: high Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Stephen Ulmer Reply-To: gpfsug main discussion list Date: Thursday, December 1, 2016 at 7:13 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert > wrote: Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Thu Dec 1 18:22:36 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Thu, 1 Dec 2016 10:22:36 -0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Hi Bob, If you mean #4 with 2x data replication...then I would be very wary as the chance of data loss would be very high given local disk failure rates. So I think its really #4 with 3x replication vs #3 with 2x replication (and raid5/6 in node) (with maybe 3x for metadata). The space overhead is somewhat similar, but the rebuild times should be much faster for #3 given that a failed disk will not place any load on the storage network (as well there will be less data placed on network). Dean From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/01/2016 04:48 AM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog ( https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From eric.wonderley at vt.edu Thu Dec 1 19:10:08 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 1 Dec 2016 14:10:08 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk Message-ID: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 1 20:28:36 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 1 Dec 2016 14:28:36 -0600 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: I always suspend the disk then use mmrestripefs -m to remove the data. Then delete the disk with mmdeldisk. ?m Migrates all critical data off of any suspended disk in this file system. Critical data is all data that would be lost if currently suspended disks were removed. Can do multiple that why and us the entire cluster to move data if you want. On 12/1/16 1:10 PM, J. Eric Wonderley wrote: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Dec 1 23:50:16 2016 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 01 Dec 2016 18:50:16 -0500 Subject: [gpfsug-discuss] Upgrading kernel on RHEL In-Reply-To: Your message of "Tue, 29 Nov 2016 20:56:25 +0000." References: <904EEBB5-E1DD-4606-993F-7E91ADA1FC37@cfms.org.uk>, Message-ID: <2253-1480636216.904015@Srjh.LZ4V.h1Mt> In the message dated: Tue, 29 Nov 2016 20:56:25 +0000, The pithy ruminations from Luis Bolinches on were: => Its been around in certain cases, some kernel <-> storage combination get => hit some not => => Scott referenced it here https://www.ibm.com/developerworks/community/wikis => /home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/ => Storage+with+GPFS+on+Linux => => https://access.redhat.com/solutions/2437991 => => It happens also on 7.2 and 7.3 ppc64 (not yet on the list of "supported") => it does not on 7.1. I can confirm this at least for XIV storage, that it => can go up to 1024 only. => => I know the FAQ will get updated about this, at least there is a CMVC that => states so. => => Long short, you create a FS, and you see all your paths die and recover and => die and receover and ..., one after another. And it never really gets done. => Also if you boot from SAN ... well you can figure it out ;) Wow, that sounds extremely similar to a kernel bug/incompatibility with GPFS that I reported in May: https://patchwork.kernel.org/patch/9140337/ https://bugs.centos.org/view.php?id=10997 My conclusion is not to apply kernel updates, unless strictly necessary (Dirty COW, anyone) or tested & validated with GPFS. Mark => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: Nathan Harper > Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug main discussion list => Cc: => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 10:44 PM => => This is the first I've heard of this max_sectors_kb issue, has it => already been discussed on the list? Can you point me to any more info? => => => => On 29 Nov 2016, at 19:08, Luis Bolinches => wrote: => => => Seen that one on 6.8 too => => teh 4096 does NOT work if storage is XIV then is 1024 => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: "Kevin D Johnson" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 8:48 PM => => I have run into the max_sectors_kb issue and creating a file => system when moving beyond 3.10.0-327 on RH 7.2 as well. You => either have to reinstall the OS or walk the kernel back to 327 => via: => => https://access.redhat.com/solutions/186763 => => Kevin D. Johnson, MBA, MAFM => Spectrum Computing, Senior Managing Consultant => => IBM Certified Deployment Professional - Spectrum Scale V4.1.1 => IBM Certified Deployment Professional - Cloud Object Storage => V3.8 => IBM Certified Solution Advisor - Spectrum Computing V1 => => 720.349.6199 - kevindjo at us.ibm.com => => => => => ----- Original message ----- => From: "Luis Bolinches" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 5:20 AM => => My 2 cents => => And I am sure different people have different opinions. => => New kernels might be problematic. => => Now got my fun with RHEL 7.3 kernel and max_sectors_kb for => new FS. Is something will come to the FAQ soon. It is => already on draft not public. => => I guess whatever you do .... get a TEST cluster and do it => there first, that is better the best advice I could give. => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 => Finland => Phone: +358 503112585 => => "If you continually give you will continually have." => Anonymous => => => => ----- Original message ----- => From: "Sobey, Richard A" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: "'gpfsug-discuss at spectrumscale.org'" < => gpfsug-discuss at spectrumscale.org> => Cc: => Subject: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 11:59 AM => => => All, => => => => As a general rule, when updating GPFS to a newer => release, would you perform a full OS update at the same => time, and/or update the kernel too? => => => => Just trying to gauge what other people do in this => respect. Personally I?ve always upgraded everything at => once ? including kernel. Am I looking for trouble? => => => => Cheers => => Richard => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise => above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => From Robert.Oesterlin at nuance.com Fri Dec 2 13:31:26 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Dec 2016 13:31:26 +0000 Subject: [gpfsug-discuss] Follow-up: Storage Rich Server options Message-ID: <13B8F551-BCA2-4690-B45A-736BA549D2FC@nuance.com> Some follow-up to the discussion I kicked off a few days ago. Using simple GPFS replication on two sites looked like a good option, until you consider it?s really RAID5, and if the replica copy of the data fails during the restripe, you lose data. It?s not as bad as RAID5 because the data blocks for a file are spread across multiple servers versus reconstruction of a single array. Raid 6 + Metadata replication isn?t a bad option but you are vulnerable to server failure. It?s relatively low expansion factor makes it attractive. My personal recommendation is going to be use Raid 6 + Metadata Replication (use ?unmountOnDiskFail=meta? option), keep a spare server around to minimize downtime if one fails. Array rebuild times will impact performance, but it?s the price of having integrity. Comments? Data Distribution Expansion Factor Data Availability (Disk Failure) Data Availability (Server Failure) Data Integrity Comments Raid 6 (6+2) + Metadata replication 1.25+ High Low High Single server or single LUN failure results in some data being unavailable. Single Drive failure - lower performance during array rebuild. 2 site replication (GPFS) 2 High High Low Similar to RAID 5 - vulnerable to multiple disk failures. Rebuild done via GPFS restripe. URE vulnerable during restripe, but data distribution may mitigate this. Raid 6 (6+2) + Full 2 site replication (GPFS) 2.5 High High High Protected against single server and double drive failures. Single Drive failure - lower performance during array rebuild. Full 3 site replication (GPFS) 3 High High High Similar to RAID 6. Protected against single server and double drive failures. Rebuild done via GPFS restripe. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 15:03:59 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 10:03:59 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: Ah...rpldisk is used to fix a single problem and typically you don't want to take a long trip thru md for just one small problem. Likely why it is seldom if ever used. On Thu, Dec 1, 2016 at 3:28 PM, Matt Weil wrote: > I always suspend the disk then use mmrestripefs -m to remove the data. > Then delete the disk with mmdeldisk. > > ?m > Migrates all critical data off of any suspended > disk in this file system. Critical data is all > data that would be lost if currently suspended > disks were removed. > Can do multiple that why and us the entire cluster to move data if you > want. > > On 12/1/16 1:10 PM, J. Eric Wonderley wrote: > > I have a few misconfigured disk groups and I have a few same size > correctly configured disk groups. > > Is there any (dis)advantage to running mmrpldisk over mmdeldisk and > mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat > painful(even with qos) process. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 20:51:14 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 15:51:14 -0500 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: Hi Michael: I was about to ask a similar question about nested filesets. I have this setup: [root at cl001 ~]# mmlsfileset home Filesets in file system 'home': Name Status Path root Linked /gpfs/home group Linked /gpfs/home/group predictHPC Linked /gpfs/home/group/predictHPC and I see this: [root at cl001 ~]# mmlsfileset home -L -d Collecting fileset usage information ... Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 I would have though that usage in fileset predictHPC would also go against the group fileset On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday < michael.holliday at crick.ac.uk> wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. > All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files > in the root fileset, or if it would include the sub filesets aswell. eg > If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Heiner.Billich at psi.ch Mon Dec 5 10:26:47 2016 From: Heiner.Billich at psi.ch (Heiner Billich) Date: Mon, 5 Dec 2016 11:26:47 +0100 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: Hello, I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... I really would prefer some ready-to-use script or program. Thank you and kind regards, Heiner Billich From peserocka at gmail.com Mon Dec 5 11:25:38 2016 From: peserocka at gmail.com (P Serocka) Date: Mon, 5 Dec 2016 19:25:38 +0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> References: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Message-ID: <6911BC0E-89DE-4C42-A46C-5DADB31E415A@gmail.com> It would be helpful to make a strict priority list of points like these: - use existing hw at no additional cost (kind of the starting point of this project) - data integrity requirement: high as you wrote - Performance (r/w/random): assumed low? - Flexibility of file tree layout: low? because: static content, "just" growing In case I got the priorities in the right order by pure chance, having ZFS as part of the solution would come to my mind (first two points). Then, with performance and flexibility on the lower ranks, I might consider... not to... deploy.... GPFS at all, but stick with with 12 separate archive servers. You actual priority list might be different. I was trying to illustrate how a strict ranking, and not cheating on yourself, simplifies drawing conclusions in a top-down approach. hth -- Peter On 2016 Dec 1. md, at 21:20 st, Oesterlin, Robert wrote: > Yep, I should have added those requirements :-) > > 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. > 2) Yes, it will be backed up > 3) I expect it to grow over time > 4) Data integrity requirement: high > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: on behalf of Stephen Ulmer > Reply-To: gpfsug main discussion list > Date: Thursday, December 1, 2016 at 7:13 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? > > Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? > > That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. > > Liberty, > > -- > Stephen > > > > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > From: on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Mon Dec 5 14:09:56 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 09:09:56 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: All, I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. Best, Brian Marshall On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or > 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by > asking google. Can please somebody point me to the source? I wonder whether > it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about > 100MB/s. I know about the possible workarounds - write a wrapper script, > run several rsyncs in parallel, distribute the rsync jobs on several nodes, > use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, > which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander.kuusemets at ut.ee Mon Dec 5 14:26:21 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Mon, 5 Dec 2016 16:26:21 +0200 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster Message-ID: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: > The CES shared root directory cannot be changed when the cluster is up > and running. If you want to modify the shared root configuration, you > must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 14:34:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 14:34:27 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets : > Hello, > > I have been thinking about setting up a CES cluster on my GPFS custer for > easier data distribution. The cluster is quite an old one - since 3.4, but > we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, > Infiniband interconnected. > The problem is this little line in Spectrum Scale documentation: > > The CES shared root directory cannot be changed when the cluster is up and > running. If you want to modify the shared root configuration, you must > bring the entire cluster down. > > > Does this mean that even the first time I'm setting CES up, I have to pull > down the whole cluster? I would understand this level of service disruption > when I already had set the directory before and now I was changing it, but > on an initial setup it's quite an inconvenience. Maybe there's a less > painful way for this? > > Best regards, > > -- > Sander Kuusemets > University of Tartu, High Performance Computing > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Dec 5 15:51:14 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 5 Dec 2016 10:51:14 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: <58AC01C5-3B4B-43C0-9F62-F5B38D90EC50@ulmer.org> This is not the answer to not writing it yourself: However, be aware that GNU xargs has the -P x option, which will try to keep x batches running. It?s a good way to optimize the number of threads for anything you?re multiprocessing in the shell. So you can build a list and have xargs fork x copies of rsync or cp at a time (with -n y items in each batch). Not waiting to start the next batch when one finishes can add up to lots of MB*s very quickly. This is not the answer to anything, and is probably a waste of your time: I started to comment that if GPFS did provide such a ?data path shortcut?, I think I?d want it to work between any two allocation areas ? even two independent filesets in the same file system. Then I started working though the possibilities for just doing that? and it devolved into the realization that we?ve got to copy the data most of the time (unless it?s in the same filesystem *and* the same storage pool ? and maybe even then depending on how the allocator works). Realizing that I decide that sometimes it just sucks to have data in the wrong (old) place. :) Maybe what we want is to be able to split an independent fileset (if it is 1:1 with a storage pool) from a filesystem and graft it onto another one ? that?s probably easier and it almost mirrors vgsplit, et al. I should go do actual work... Liberty, > On Dec 5, 2016, at 9:09 AM, Brian Marshall > wrote: > > All, > > I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. > > We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. > > Best, > Brian Marshall > > On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich > wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 5 16:01:33 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 11:01:33 -0500 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 5 16:14:06 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 5 Dec 2016 16:14:06 +0000 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Hi Brian This boils down to a network contention issue ? that you are maxing out the network resources and GPFS is waiting. Now- digging deeper into why, that?s a larger issue. I?m still struggling with this myself. It takes a lot of digging into network stats, utilization, dropped packets, etc. It could be at the server/client or elsewhere in the network. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 5, 2016 at 10:01 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] waiting for exclusive use of connection for sending msg Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Dec 5 16:33:24 2016 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 5 Dec 2016 17:33:24 +0100 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe Message-ID: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Mon Dec 5 20:49:44 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 5 Dec 2016 20:49:44 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 21:31:55 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 16:31:55 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Hi Everyone, In the GPFS documentation (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) it has this to say about the duration of an upgrade from 3.5 to 4.1: > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists >because some GPFS 4.1 features become available on each node as soon as the node is upgraded, while >other features will not become available until you upgrade all participating nodes. Does anyone have a feel for what "a short time" means? I'm looking to upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the size of our system it might take several weeks to complete. Seeing this language concerns me that after some period of time something bad is going to happen, but I don't know what that period of time is. Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kevindjo at us.ibm.com Mon Dec 5 21:35:54 2016 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Mon, 5 Dec 2016 21:35:54 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 22:52:58 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 22:52:58 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I read it as "do your best". I doubt there can be problems that shows up after 3 weeks, that wouldn't also be triggerable after 1 day. -jf man. 5. des. 2016 kl. 22.32 skrev Aaron Knister : > Hi Everyone, > > In the GPFS documentation > ( > http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm > ) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time > without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. > The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 23:00:43 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 18:00:43 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Jan-Frode! If you don't mind sharing, over what period of time did you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you have in your cluster? -Aaron On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > I read it as "do your best". I doubt there can be problems that shows up > after 3 weeks, that wouldn't also be triggerable after 1 day. > > > -jf > > man. 5. des. 2016 kl. 22.32 skrev Aaron Knister > >: > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Tue Dec 6 07:25:13 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Tue, 6 Dec 2016 09:25:13 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Hello Aaron, I thought I'd share my two cents, as I just went through the process. I thought I'd do the same, start upgrading from where I can and wait until machines come available. It took me around 5 weeks to complete the process, but the last two were because I was super careful. At first nothing happened, but at one point, a week into the upgrade cycle, when I tried to mess around (create, delete, test) a fileset, suddenly I got the weirdest of error messages while trying to delete a fileset for the third time from a client node - I sadly cannot exactly remember what it said, but I can describe what happened. After the error message, the current manager of our cluster fell into arbitrating state, it's metadata disks were put to down state, manager status was given to our other server node and it's log was spammed with a lot of error messages, something like this: > mmfsd: > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) > + 0)' failed. > Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 > in process 15113, link reg 0xFFFFFFFFFFFFFFFF. > Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= > (sizeof(Pad32) + 0)) in line 1411 of file > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h > Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: > Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 > logAssertFailed + 0x2D6 at ??:0 > Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 > PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 > Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 > tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 > Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 > RcvWorker::RcvMain() + 0x107 at ??:0 > Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B > RcvWorker::thread(void*) + 0x5B at ??:0 > Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 > Thread::callBody(Thread*) + 0x46 at ??:0 > Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 > start_thread + 0xD1 at ??:0 > Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + > 0x6D at ??:0 After this I tried to put disks up again, which failed half-way through and did the same with the other server node (current master). So after this my cluster had effectively failed, because all the metadata disks were down and there was no path to the data disks. When I tried to put all the metadata disks up with one start command, then it worked on third try and the cluster got into working state again. Downtime about an hour. I created a PMR with this information and they said that it's a bug, but it's a tricky one so it's going to take a while, but during that it's not recommended to use any commands from this list: > Our apologies for the delayed response. Based on the debug data we > have and looking at the source code, we believe the assert is due to > incompatibility is arising from the feature level version for the > RPCs. In this case the culprit is the PIT "interesting inode" code. > > Several user commands employ PIT (Parallel Inode Traversal) code to > traverse each data block of every file: > >> >> mmdelfileset >> mmdelsnapshot >> mmdefragfs >> mmfileid >> mmrestripefs >> mmdeldisk >> mmrpldisk >> mmchdisk >> mmadddisk > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'node3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > server11 which is receiving it is trying to interpret the RPC beyond > the valid region (as its feature level 1502 supports PIT interesting > inodes). And apparently any of the fileset commands either, as I failed with those. After I finished the upgrade, everything has been working wonderfully. But during this upgrade time I'd recommend to tread really carefully. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/05/2016 11:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing > this language concerns me that after some period of time something bad > is going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > From janfrode at tanso.net Tue Dec 6 08:04:04 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 6 Dec 2016 09:04:04 +0100 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Currently I'm with IBM Lab Services, and only have small test clusters myself. I'm not sure I've done v3.5->4.1 upgrades, but this warning about upgrading all nodes within a "short time" is something that's always been in the upgrade instructions, and I've been through many of these (I've been a gpfs sysadmin since 2002 :-) http://www.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs300.doc/bl1ins_migratl.htm https://www.scribd.com/document/51036833/GPFS-V3-4-Concepts-Planning-and-Installation-Guide BTW: One relevant issue I saw recently was a rolling upgrade from 4.1.0 to 4.1.1.7 where we had some nodes in the cluster running 4.1.0.0. Apparently there had been some CCR message format changes in a later release that made 4.1.0.0-nodes not being able to properly communicate with 4.1.1.4 -- even though they should be able to co-exist in the same cluster according to the upgrade instructions. So I guess the more versions you mix in a cluster, the more likely you're to hit a version mismatch bug. Best to feel a tiny bit uneasy about not running same version on all nodes, and hurry to get them all upgraded to the same level. And also, should you hit a bug during this process, the likely answer will be to upgrade everything to same level. -jf On Tue, Dec 6, 2016 at 12:00 AM, Aaron Knister wrote: > Thanks Jan-Frode! If you don't mind sharing, over what period of time did > you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you > have in your cluster? > > -Aaron > > On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > >> I read it as "do your best". I doubt there can be problems that shows up >> after 3 weeks, that wouldn't also be triggerable after 1 day. >> >> >> -jf >> >> man. 5. des. 2016 kl. 22.32 skrev Aaron Knister >> >: >> >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com >> .ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon >> as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing >> this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 6 08:17:37 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 6 Dec 2016 08:17:37 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee>, Message-ID: I'm sure we changed this recently, I think all the CES nodes nerd to be down, but I don't think the whole cluster. We certainly set it for the first tine "live". Maybe I depends on the code version. Simi ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 05 December 2016 14:34 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES services on an existing GPFS cluster No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets >: Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: The CES shared root directory cannot be changed when the cluster is up and running. If you want to modify the shared root configuration, you must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From duersch at us.ibm.com Tue Dec 6 13:20:20 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 6 Dec 2016 08:20:20 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: You fit within the "short time". The purpose of this remark is to make it clear that this should not be a permanent stopping place. Getting all nodes up to the same version is safer and allows for the use of new features. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 5 Dec 2016 16:31:55 -0500 > From: Aaron Knister > To: gpfsug main discussion list > Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question > Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> > Content-Type: text/plain; charset="utf-8"; format=flowed > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ > com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a > time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short > time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 6 16:40:25 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 6 Dec 2016 10:40:25 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 6 17:36:11 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 06 Dec 2016 17:36:11 +0000 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil wrote: > Hello all, > > Thanks for sharing that. I am setting this up on our CES nodes. In this > example the nvme devices are not persistent. RHEL's default udev rules put > them in /dev/disk/by-id/ persistently by serial number so I modified > mmdevdiscover to look for them there. What are others doing? custom udev > rules for the nvme devices? > > Also I have used LVM in the past to stitch multiple nvme together for > better performance. I am wondering in the use case with GPFS that it may > hurt performance by hindering the ability of GPFS to do direct IO or > directly accessing memory. Any opinions there? > > Thanks > > Matt > On 12/5/16 10:33 AM, Ulf Troppens wrote: > > FYI ... in case not seen .... benchmark for LROC with NVMe > > http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf > > > -- > IBM Spectrum Scale Development - Client Engagements & Solutions Delivery > Consulting IT Specialist > Author "Storage Networks Explained" > > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Dec 7 03:47:00 2016 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 06 Dec 2016 22:47:00 -0500 Subject: [gpfsug-discuss] ltfsee fsopt question... Message-ID: <114349.1481082420@turing-police.cc.vt.edu> Is it possible to use 'ltfsee fsopt' to set stub and preview sizes on a per-fileset basis, or is it fixed across an entire filesystem? From r.sobey at imperial.ac.uk Wed Dec 7 06:29:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 06:29:27 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 Message-ID: A word of wisdom: do not try and run CES on RHEL 7.3 :) Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn't intend to run 7.3 of course as I knew it wasn't supported. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Wed Dec 7 06:45:50 2016 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 7 Dec 2016 12:15:50 +0530 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org A word of wisdom: do not try and run CES on RHEL 7.3 J Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 7 09:13:23 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 09:13:23 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: I admit I didn?t do a whole lot of troubleshooting. We don?t run NFS so I can?t speak about that. Initially the server looked like it came back ok, albeit ?Node starting up..? was observed in the output of mmlscluster ?ces. At that time I was not sure if that was a) expected behaviour and/or b) related to GPFS 4.2.1-2. Once the node went back into service I had no complaints from customers that they faced any connectivity issues. The next morning I shut down a second CES node in order to upgrade it, but I observed that the first once went into a failed state (might have been a nasty coincidence!): [root at icgpfs-ces1 yum.repos.d]# mmces state show -a NODE AUTH AUTH_OBJ NETWORK NFS OBJ SMB CES icgpfs-ces1 FAILED DISABLED HEALTHY DISABLED DISABLED DEPEND STARTING icgpfs-ces2 DEPEND DISABLED SUSPENDED DEPEND DEPEND DEPEND DEPEND icgpfs-ces3 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY icgpfs-ces4 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY (Where ICGPFS-CES1 was the node running 7.3). Also in mmces event show ?N icgpfs-ces1 ?time day the following error was logged about twice per minute: icgpfs-ces1 2016-12-06 06:32:04.968269 GMT wnbd_restart INFO WINBINDD process was not running. Trying to start it I moved the CES IP from icgpfs-ces2 to icgpfs-ces3 prior to suspending ?ces2. It was about that point I decided to abandon the planned upgrade of ?ces2, resume the node and then suspend ?ces1. Attempts to downgrade the Kernel/OS/redhat-release RPM back to 7.2 worked well, except when I tried to start CES again and the node reported ?Node failed?. I then rebuilt it completely, restored it to the cluster and it appears to be fine. Sorry I can?t be any more specific than that but I hope it helps. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ravi K Komanduri Sent: 07 December 2016 06:46 To: r.sobey at inperial.ac.uk Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES ON RHEL7.3 Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ A word of wisdom: do not try and run CES on RHEL 7.3 ?Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Wed Dec 7 09:54:05 2016 From: peserocka at gmail.com (P Serocka) Date: Wed, 7 Dec 2016 17:54:05 +0800 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: <1FBA5DC2-DD14-4606-9B5A-A4373191B461@gmail.com> > > I would have though that usage in fileset predictHPC would also go against the group fileset quota-wise these filesets are "siblings", don't be fooled by the hierarchy formed by namespace linking. hth -- Peter On 2016 Dec 3. md, at 04:51 st, J. Eric Wonderley wrote: > Hi Michael: > > I was about to ask a similar question about nested filesets. > > I have this setup: > [root at cl001 ~]# mmlsfileset home > Filesets in file system 'home': > Name Status Path > root Linked /gpfs/home > group Linked /gpfs/home/group > predictHPC Linked /gpfs/home/group/predictHPC > > > and I see this: > [root at cl001 ~]# mmlsfileset home -L -d > Collecting fileset usage information ... > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset > group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 > predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 > > I would have though that usage in fileset predictHPC would also go against the group fileset > > On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files in the root fileset, or if it would include the sub filesets aswell. eg If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Dec 7 10:34:27 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 7 Dec 2016 05:34:27 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson From daniel.kidger at uk.ibm.com Wed Dec 7 12:36:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 12:36:56 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Wed Dec 7 14:24:38 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Wed, 07 Dec 2016 14:24:38 +0000 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: > IBM says it should work ok, we are not so sure. We had node expels that > stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Dec 7 14:37:15 2016 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 7 Dec 2016 09:37:15 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Dec 7 14:47:46 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 7 Dec 2016 09:47:46 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38@brown.edu> Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 7 14:58:39 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 7 Dec 2016 14:58:39 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org" > Cc: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 7 15:59:50 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 7 Dec 2016 09:59:50 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: <05e77cc6-e3f6-7b06-521c-1d30606e02e0@wustl.edu> On 12/6/16 11:36 AM, Sven Oehme wrote: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? yes /dev/sdX may change after a reboot especially if you add devices. using udev rules makes sure the device is always the same. if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. yes that is what I meant by modifying mmdevdiscover on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . K that is what I was thinking. sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil > wrote: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Dec 7 16:00:46 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 16:00:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 7 16:31:23 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 11:31:23 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Message-ID: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Thanks Sander. That's disconcerting...yikes! Sorry for your trouble but thank you for sharing. I'm surprised this didn't shake out during testing of gpfs 3.5 and 4.1. I wonder if in light of this it's wise to do the clients first? My logic being that there's clearly an example here of 4.1 servers expecting behavior that only 4.1 clients provide. I suppose, though, that there's just as likely a chance that there could be a yet to be discovered bug in a situation where a 4.1 client expects something not provided by a 3.5 server. Our current plan is still to take servers first but I suspect we'll do a fair bit of testing with the PIT commands in our test environment just out of curiosity. Also out of curiosity, how long ago did you open that PMR? I'm wondering if there's a chance they've fixed this issue. I'm also perplexed and cocnerned that there's no documentation of the PIT commands to avoid during upgrades that I can find in any of the GPFS upgrade documentation. -Aaron On 12/6/16 2:25 AM, Sander Kuusemets wrote: > Hello Aaron, > > I thought I'd share my two cents, as I just went through the process. I > thought I'd do the same, start upgrading from where I can and wait until > machines come available. It took me around 5 weeks to complete the > process, but the last two were because I was super careful. > > At first nothing happened, but at one point, a week into the upgrade > cycle, when I tried to mess around (create, delete, test) a fileset, > suddenly I got the weirdest of error messages while trying to delete a > fileset for the third time from a client node - I sadly cannot exactly > remember what it said, but I can describe what happened. > > After the error message, the current manager of our cluster fell into > arbitrating state, it's metadata disks were put to down state, manager > status was given to our other server node and it's log was spammed with > a lot of error messages, something like this: > >> mmfsd: >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >> + 0)' failed. >> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >> (sizeof(Pad32) + 0)) in line 1411 of file >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >> logAssertFailed + 0x2D6 at ??:0 >> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >> RcvWorker::RcvMain() + 0x107 at ??:0 >> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >> RcvWorker::thread(void*) + 0x5B at ??:0 >> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >> Thread::callBody(Thread*) + 0x46 at ??:0 >> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >> start_thread + 0xD1 at ??:0 >> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >> 0x6D at ??:0 > After this I tried to put disks up again, which failed half-way through > and did the same with the other server node (current master). So after > this my cluster had effectively failed, because all the metadata disks > were down and there was no path to the data disks. When I tried to put > all the metadata disks up with one start command, then it worked on > third try and the cluster got into working state again. Downtime about > an hour. > > I created a PMR with this information and they said that it's a bug, but > it's a tricky one so it's going to take a while, but during that it's > not recommended to use any commands from this list: > >> Our apologies for the delayed response. Based on the debug data we >> have and looking at the source code, we believe the assert is due to >> incompatibility is arising from the feature level version for the >> RPCs. In this case the culprit is the PIT "interesting inode" code. >> >> Several user commands employ PIT (Parallel Inode Traversal) code to >> traverse each data block of every file: >> >>> >>> mmdelfileset >>> mmdelsnapshot >>> mmdefragfs >>> mmfileid >>> mmrestripefs >>> mmdeldisk >>> mmrpldisk >>> mmchdisk >>> mmadddisk >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'node3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> server11 which is receiving it is trying to interpret the RPC beyond >> the valid region (as its feature level 1502 supports PIT interesting >> inodes). > > And apparently any of the fileset commands either, as I failed with those. > > After I finished the upgrade, everything has been working wonderfully. > But during this upgrade time I'd recommend to tread really carefully. > > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Wed Dec 7 16:56:52 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Wed, 7 Dec 2016 18:56:52 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: It might have been some kind of a bug only we got, but I thought I'd share, just in case. The email when they said they opened a ticket for this bug's fix was quite exactly a month ago, so I doubt they've fixed it, as they said it might take a while. I don't know if this is of any help, but a paragraph from the explanation: > The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine > PIT_HelperGetWorkMH(). There are two RPC structures used in this routine > - PitHelperWorkReport > - PitInodeListPacket > > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'stage3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > tank1 which is receiving it is trying to interpret the RPC beyond the > valid region (as its feature level 1502 supports PIT interesting > inodes). This is resulting in the assert you see. As a short term > measure bringing all the nodes to the same feature level should make > the problem go away. But since we support backward compatibility, we > are opening an APAR to create a code fix. It's unfortunately going to > be a tricky fix, which is going to take a significant amount of time. > Therefore I don't expect the team will be able to provide an efix > anytime soon. We recommend you bring all nodes in all clusters up the > latest level 4.2.0.4 and run the "mmchconfig release=latest" and > "mmchfs -V full" commands that will ensure all daemon levels and fs > levels are at the necessary level that supports the 1502 RPC feature > level. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/07/2016 06:31 PM, Aaron Knister wrote: > Thanks Sander. That's disconcerting...yikes! Sorry for your trouble > but thank you for sharing. > > I'm surprised this didn't shake out during testing of gpfs 3.5 and > 4.1. I wonder if in light of this it's wise to do the clients first? > My logic being that there's clearly an example here of 4.1 servers > expecting behavior that only 4.1 clients provide. I suppose, though, > that there's just as likely a chance that there could be a yet to be > discovered bug in a situation where a 4.1 client expects something not > provided by a 3.5 server. Our current plan is still to take servers > first but I suspect we'll do a fair bit of testing with the PIT > commands in our test environment just out of curiosity. > > Also out of curiosity, how long ago did you open that PMR? I'm > wondering if there's a chance they've fixed this issue. I'm also > perplexed and cocnerned that there's no documentation of the PIT > commands to avoid during upgrades that I can find in any of the GPFS > upgrade documentation. > > -Aaron > > On 12/6/16 2:25 AM, Sander Kuusemets wrote: >> Hello Aaron, >> >> I thought I'd share my two cents, as I just went through the process. I >> thought I'd do the same, start upgrading from where I can and wait until >> machines come available. It took me around 5 weeks to complete the >> process, but the last two were because I was super careful. >> >> At first nothing happened, but at one point, a week into the upgrade >> cycle, when I tried to mess around (create, delete, test) a fileset, >> suddenly I got the weirdest of error messages while trying to delete a >> fileset for the third time from a client node - I sadly cannot exactly >> remember what it said, but I can describe what happened. >> >> After the error message, the current manager of our cluster fell into >> arbitrating state, it's metadata disks were put to down state, manager >> status was given to our other server node and it's log was spammed with >> a lot of error messages, something like this: >> >>> mmfsd: >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >>> >>> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >>> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >>> + 0)' failed. >>> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >>> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >>> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >>> (sizeof(Pad32) + 0)) in line 1411 of file >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >>> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >>> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >>> logAssertFailed + 0x2D6 at ??:0 >>> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >>> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >>> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >>> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >>> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >>> RcvWorker::RcvMain() + 0x107 at ??:0 >>> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >>> RcvWorker::thread(void*) + 0x5B at ??:0 >>> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >>> Thread::callBody(Thread*) + 0x46 at ??:0 >>> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >>> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >>> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >>> start_thread + 0xD1 at ??:0 >>> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >>> 0x6D at ??:0 >> After this I tried to put disks up again, which failed half-way through >> and did the same with the other server node (current master). So after >> this my cluster had effectively failed, because all the metadata disks >> were down and there was no path to the data disks. When I tried to put >> all the metadata disks up with one start command, then it worked on >> third try and the cluster got into working state again. Downtime about >> an hour. >> >> I created a PMR with this information and they said that it's a bug, but >> it's a tricky one so it's going to take a while, but during that it's >> not recommended to use any commands from this list: >> >>> Our apologies for the delayed response. Based on the debug data we >>> have and looking at the source code, we believe the assert is due to >>> incompatibility is arising from the feature level version for the >>> RPCs. In this case the culprit is the PIT "interesting inode" code. >>> >>> Several user commands employ PIT (Parallel Inode Traversal) code to >>> traverse each data block of every file: >>> >>>> >>>> mmdelfileset >>>> mmdelsnapshot >>>> mmdefragfs >>>> mmfileid >>>> mmrestripefs >>>> mmdeldisk >>>> mmrpldisk >>>> mmchdisk >>>> mmadddisk >>> The problematic one is the 'PitInodeListPacket' subrpc which is a part >>> of an "interesting inode" code change. Looking at the dumps its >>> evident that node 'node3' which sent the RPC is not capable of >>> supporting interesting inode (max feature level is 1340) and node >>> server11 which is receiving it is trying to interpret the RPC beyond >>> the valid region (as its feature level 1502 supports PIT interesting >>> inodes). >> >> And apparently any of the fileset commands either, as I failed with >> those. >> >> After I finished the upgrade, everything has been working wonderfully. >> But during this upgrade time I'd recommend to tread really carefully. >> >> Best regards, >> > From aaron.s.knister at nasa.gov Wed Dec 7 17:31:28 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 12:31:28 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: Thanks! I do have a question, though. Feature level 1340 I believe is equivalent to GPFS version 3.5.0.11. Feature level 1502 is GPFS 4.2 if I understand correctly. That suggests to me there are 3.5 and 4.2 nodes in the same cluster? Or at least 4.2 nodes in a cluster where the max feature level is 1340. I didn't think either of those are supported configurations? Am I missing something? -Aaron On 12/7/16 11:56 AM, Sander Kuusemets wrote: > It might have been some kind of a bug only we got, but I thought I'd > share, just in case. > > The email when they said they opened a ticket for this bug's fix was > quite exactly a month ago, so I doubt they've fixed it, as they said it > might take a while. > > I don't know if this is of any help, but a paragraph from the explanation: > >> The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine >> PIT_HelperGetWorkMH(). There are two RPC structures used in this routine >> - PitHelperWorkReport >> - PitInodeListPacket >> >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'stage3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> tank1 which is receiving it is trying to interpret the RPC beyond the >> valid region (as its feature level 1502 supports PIT interesting >> inodes). This is resulting in the assert you see. As a short term >> measure bringing all the nodes to the same feature level should make >> the problem go away. But since we support backward compatibility, we >> are opening an APAR to create a code fix. It's unfortunately going to >> be a tricky fix, which is going to take a significant amount of time. >> Therefore I don't expect the team will be able to provide an efix >> anytime soon. We recommend you bring all nodes in all clusters up the >> latest level 4.2.0.4 and run the "mmchconfig release=latest" and >> "mmchfs -V full" commands that will ensure all daemon levels and fs >> levels are at the necessary level that supports the 1502 RPC feature >> level. > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From carlz at us.ibm.com Wed Dec 7 17:47:52 2016 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 7 Dec 2016 12:47:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: Message-ID: We don't allow mixing of different licensing models (i.e. socket and capacity) within a single cluster*. As we worked through the implications, we realized it would be just too complicated to determine how to license any non-NSD nodes (management, CES, clients, etc.). In the socket model they are chargeable, in the capacity model they are not, and while we could have made up some rules, they would have added even more complexity to Scale licensing. This in turn is why we "grandfathered in" those customers already on Advanced Edition, so that they don't have to convert existing clusters to the new metric unless or until they want to. They can continue to buy Advanced Edition. The other thing we wanted to do with the capacity metric was to make the licensing more friendly to architectural best practices or design choices. So now you can have whatever management, gateway, etc. servers you need without paying for additional server licenses. In particular, client-only clusters cost nothing, and you don't have to keep track of clients if you have a virtual environment where clients come and go rapidly. I'm always happy to answer other questions about licensing. regards, Carl Zetie *OK, there is one exception involving future ESS models and existing clusters. If this is you, please have a conversation with your account team. Carl Zetie Program Director, OM for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 12/07/2016 09:59 AM Subject: gpfsug-discuss Digest, Vol 59, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (Felipe Knop) 2. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (David D. Johnson) 3. Re: Strategies - servers with local SAS disks (Simon Thompson (Research Computing - IT Services)) ---------------------------------------------------------------------- Message: 1 Date: Wed, 7 Dec 2016 09:37:15 -0500 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: Content-Type: text/plain; charset="us-ascii" All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/48aa0319/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 7 Dec 2016 09:47:46 -0500 From: "David D. Johnson" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38 at brown.edu> Content-Type: text/plain; charset="utf-8" Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html < http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html> > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/92819f21/attachment-0001.html > ------------------------------ Message: 3 Date: Wed, 7 Dec 2016 14:58:39 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: Content-Type: text/plain; charset="us-ascii" I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Cc: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/51c1a2ea/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 20 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Dec 8 13:33:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Dec 2016 13:33:40 +0000 Subject: [gpfsug-discuss] Flash Storage wiki entry incorrect Message-ID: To whom it may concern, I've just set up an LROC disk in one of my CES nodes and going from the example in: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage I used the following as a guide: cat lroc-stanza.txt %nsd: nsd=lroc-nsd1 device=/dev/faio server=gpfs-client1 <-- is not a NSD server, but client with Fusion i/o or SSD install as target for LROC usage=localCache The only problems are that 1) hyphens aren't allowed in NSD names and 2) the server parameter should be servers (plural). Once I worked that out I was good to go but perhaps someone could update the page with a (working) example? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Dec 8 19:27:08 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 8 Dec 2016 14:27:08 -0500 Subject: [gpfsug-discuss] GPFS fails to use VERBS RDMA because link is not up yet Message-ID: Under RHEL/CentOS 6, I had hacked an ?ibready? script for the SysV style init system that waits for link to come up on the infiniband port before allowing GPFS to start. Now that we?re moving to CentOS/RHEL 7.2, I need to reimplement this workaround for the fact that GPFS only tries once to start VERBS RDMA, and gives up if there is no link. I think it can be done by making a systemd unit that asks to run Before gpfs. Wondering if anyone has already done this to avoid reinventing the wheel?. Thanks, ? ddj Dave Johnson Brown University From r.sobey at imperial.ac.uk Fri Dec 9 11:52:12 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 11:52:12 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access Message-ID: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Fri Dec 9 13:21:12 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 9 Dec 2016 08:21:12 -0500 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: Message-ID: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone > On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: > > Hi all, > > Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). > > Cheers > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtolson at us.ibm.com Fri Dec 9 14:32:45 2016 From: jtolson at us.ibm.com (John T Olson) Date: Fri, 9 Dec 2016 07:32:45 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From billowen at us.ibm.com Fri Dec 9 15:44:28 2016 From: billowen at us.ibm.com (Bill Owen) Date: Fri, 9 Dec 2016 08:44:28 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Hi John, Nice paper! Regarding object auditing: - Does Varonis have an API that could be used to tell it when object operations complete from normal object interface? If so, a middleware module could be used to send interesting events to Varonis (this is already done in openstack auditing using CADF) - With Varonis, can you monitor operations just on ".data" files? (these are the real objects) Can you also include file metadata values in the logging of these operations? If so, the object url could be pulled whenever a .data file is created, renamed (delete), or read Thanks, Bill Owen billowen at us.ibm.com Spectrum Scale Object Storage 520-799-4829 From: John T Olson/Tucson/IBM at IBMUS To: gpfsug main discussion list Date: 12/09/2016 07:33 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:14:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:14:14 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: , <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Thanks Aaron. I will take a look on Moday. Now I think about it, I did something like this on the old Samba/CTDB cluster before we deployed CES, so it must be possible, just to what level IBM will support it. Have a great weekend, Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister Sent: 09 December 2016 13:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:15:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:15:03 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com>, Message-ID: Thanks John, As I said to Aaron I will also take a look at this on Monday. Regards Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John T Olson Sent: 09 December 2016 14:32 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp [https://s0.wp.com/i/blank.jpg] Samba: Logging User Activity moiristo.wordpress.com Ever wondered why Samba seems to log so many things, except what you're interested in? So did I, and it took me a while to find out that 1) there actually is a solution and 2) how to configur... I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From aaron.s.knister at nasa.gov Sat Dec 10 03:53:06 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 9 Dec 2016 22:53:06 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: <38d056ad-833f-1582-58fd-0e65a52ded6c@nasa.gov> Thanks Steve, that was exactly the answer I was looking for. On 12/6/16 8:20 AM, Steve Duersch wrote: > You fit within the "short time". The purpose of this remark is to make > it clear that this should not be a permanent stopping place. > Getting all nodes up to the same version is safer and allows for the use > of new features. > > > Steve Duersch > Spectrum Scale > 845-433-7902 > IBM Poughkeepsie, New York > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 5 Dec 2016 16:31:55 -0500 >> From: Aaron Knister >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question >> Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> >> Content-Type: text/plain; charset="utf-8"; format=flowed >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ >> com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> >> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 10 05:31:39 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 9 Dec 2016 21:31:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sat Dec 10 12:35:15 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 10 Dec 2016 07:35:15 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Eric! I have a few follow up questions for you-- Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? I'm curious to know what version of 4.1 you were at when you ran the mmchconfig. Would you mind sharing any log messages related to the errors you saw when you ran the mmchconfig? Thanks! Sent from my iPhone > On Dec 10, 2016, at 12:31 AM, Eric Horst wrote: > > >> On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. > > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Dec 11 15:07:09 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 11 Dec 2016 10:07:09 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I thought I'd share this with folks. I saw some log asserts in our test environment (~1050 client nodes and 12 manager/server nodes). I'm going from 3.5.0.31 (well, 2 clients are still at 3.5.0.19) -> 4.1.1.10. I've been running filebench in a loop for the past several days. It's sustaining about 60k write iops and about 15k read iops to the metadata disks for the filesystem I'm testing with, so I'd say it's getting pushed reasonably hard. The test cluster had 4.1 clients before it had 4.1 servers but after flipping 420 clients from 3.5.0.31 to 4.1.1.10 and starting up filebench I'm now seeing periodic logasserts from the manager/server nodes: Dec 11 08:57:39 loremds12 mmfs: Generic error in /project/sprelfks2/build/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C line 304 retCode 0, reasonCode 0 Dec 11 08:57:39 loremds12 mmfs: mmfsd: Error=MMFS_GENERIC, ID=0x30D9195E, Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 08:57:39 loremds12 mmfs: [X] logAssertFailed: (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: [X] return code 0, reason code 0, log record tag 0 Dec 11 08:57:42 loremds12 mmfs: [E] 10:0xA1BD5B RcvWorker::thread(void*).A1BD00 + 0x5B at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 11:0x622126 Thread::callBody(Thread*).6220E0 + 0x46 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 12:0x61220F Thread::callBodyWrapper(Thread*).612180 + 0x8F at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 13:0x7FF4E6BE66B6 start_thread + 0xE6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 14:0x7FF4E5FEE06D clone + 0x6D at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 2:0x9F95E9 logAssertFailed.9F9440 + 0x1A9 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 3:0x1232836 TokenClass::fixClientMode(Token*, int, int, int, CopysetRevoke*).1232350 + 0x4E6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 4:0x1235593 TokenClass::HandleTellRequest(RpcContext*, Request*, char**, int).1232AD0 + 0x2AC3 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 5:0x123A23C HandleTellRequestInterface(RpcContext*, Request*, char**, int).123A0D0 + 0x16C at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 6:0x125C6B0 queuedTellServer(RpcContext*, Request*, int, unsigned int).125C670 + 0x40 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 7:0x125EF72 tmHandleTellServer(RpcContext*, char*).125EEC0 + 0xB2 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 8:0xA12668 tscHandleMsg(RpcContext*, MsgDataBuf*).A120D0 + 0x598 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 9:0xA1BC4E RcvWorker::RcvMain().A1BB50 + 0xFE at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] *** Traceback: Dec 11 08:57:42 loremds12 mmfs: [N] Signal 6 at location 0x7FF4E5F456D5 in process 12188, link reg 0xFFFFFFFFFFFFFFFF. Dec 11 08:57:42 loremds12 mmfs: [X] *** Assert exp((!"downgrade to mode which is not StrictlyWeaker") node 584 old mode ro new mode (A: D: A) ) in line 304 of file /project/sprelfks2/bui ld/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C I've seen different messages on that third line of the "Tag=" message: Dec 11 00:16:40 loremds11 mmfs: Tag=5012168 node 825 old mode ro new mode 0x31 Dec 11 01:52:53 loremds10 mmfs: Tag=5016618 node 655 old mode ro new mode (A: MA D: ) Dec 11 02:15:57 loremds10 mmfs: Tag=5045549 node 994 old mode ro new mode (A: A D: A) Dec 11 08:14:22 loremds10 mmfs: Tag=5067054 node 237 old mode ro new mode 0x08 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 00:47:39 loremds09 mmfs: Tag=4998635 node 461 old mode ro new mode (A:R D: ) It's interesting to note that all of these node indexes are still running 3.5. I'm going to open up a PMR but thought I'd share the gory details here and see if folks had any insight. I'm starting to wonder if 4.1 clients are more tolerant of 3.5 servers than 4.1 servers are of 3.5 clients. -Aaron On 12/5/16 4:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a time >> without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sun Dec 11 21:28:39 2016 From: erich at uw.edu (Eric Horst) Date: Sun, 11 Dec 2016 13:28:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Sat, Dec 10, 2016 at 4:35 AM, Aaron Knister wrote: > Thanks Eric! > > I have a few follow up questions for you-- > > Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? > I'm curious to know what version of 4.1 you were at when you ran the > mmchconfig. > I went from 3.5.0-28 to 4.1.0-8 to 4.2.1-1. > > Would you mind sharing any log messages related to the errors you saw when > you ran the mmchconfig? > > Unfortunately I didn't save any actual logs from the update. I did the first cluster in early July so nothing remains. The only note I have is: "On update, after finalizing gpfs 4.1 the quota file format apparently changed and caused a mmrepquota hang/deadlock. Had to shutdown and restart the whole cluster." Sorry to not be very helpful on that front. -Eric > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different > clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end > mmchconfig release=LATEST. Minutes after flipping to latest the cluster > became non-responsive, with node mmfs panics and everything had to be > restarted. Logs indicated it was a quota problem. In 4.1 the quota files > move from externally visible files to internal hidden files. I suspect the > quota file transition can't be done without a cluster restart. When I did > the second cluster I upgraded all nodes and then very quickly stopped and > started the entire cluster, issuing the mmchconfig in the middle. No quota > panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end > mmchconfig release=LATEST. No cluster restart. Everything seemed to work > okay. Later, restarting a node I got weird fstab errors on gpfs startup and > using certain commands, notably mmfind, the command would fail with > something like "can't find /dev/uwfs" (our filesystem.) I restarted the > whole cluster and everything began working normally. In this case 4.2 got > rid of /dev/fsname. Just like in the quota case it seems that this > transition can't be seamless. Doing the second cluster I upgraded all nodes > and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very > complex product with a very easy upgrade procedure. I could imagine many > ways that an upgrade hop of two major versions in two weeks could go very > wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 12 13:55:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 12 Dec 2016 13:55:52 +0000 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? Message-ID: Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it will work, but I?m not sure if IBM would support it. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Dec 13 04:05:08 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 12 Dec 2016 23:05:08 -0500 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? In-Reply-To: References: Message-ID: Hi Bob, I have not, although I started to go down that path. I had wanted erasure coded pools but in order to front an erasure coded pool with an RBD volume you apparently need a cache tier? Seems that doesn't give one the performance they might want for this type of workload (http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#a-word-of-caution). If you're OK replicating the data I suspect it might work well. I did try sheepdog (https://sheepdog.github.io/sheepdog/) and that did work the way I wanted it to with erasure coding and gave me pretty good performance to boot. -Aaron On 12/12/16 8:55 AM, Oesterlin, Robert wrote: > Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it > will work, but I?m not sure if IBM would support it. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Thu Dec 15 13:13:43 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 15 Dec 2016 13:13:43 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Ah. I stopped reading when I read that the service account needs Domain Admin rights. I doubt that will fly unfortunately. Thanks though John. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John T Olson Sent: 09 December 2016 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister > To: gpfsug main discussion list > Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Mark.Bush at siriuscom.com Thu Dec 15 20:32:11 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 20:32:11 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 20:47:12 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 20:47:12 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Dec 15 20:52:17 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 15 Dec 2016 20:52:17 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu>, <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Dec 15 21:19:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 21:19:20 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Message-ID: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 21:25:21 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 21:25:21 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Message-ID: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Dec 17 04:24:34 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 16 Dec 2016 23:24:34 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name Message-ID: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Hi Everyone, I'm curious about the most straightforward and fastest way to identify what NSD a given /dev device is. The best I can come up with is "tspreparedisk -D device_name" which gives me something like: tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: that I can then parse and map the nsd id to the nsd name. I hesitate calling ts* commands directly and I admit it's perhaps an irrational fear, but I associate the -D flag with "delete" in my head and am afraid that some day -D may be just that and *poof* there go my NSD descriptors. Is there a cleaner way? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 17 04:55:00 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 16 Dec 2016 20:55:00 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: Perhaps this: mmlsnsd -m -Eric On Fri, Dec 16, 2016 at 8:24 PM, Aaron Knister wrote: > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational fear, > but I associate the -D flag with "delete" in my head and am afraid that > some day -D may be just that and *poof* there go my NSD descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Dec 17 07:04:08 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 17 Dec 2016 07:04:08 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Sat Dec 17 08:35:05 2016 From: jtucker at pixitmedia.com (Jez Tucker) Date: Sat, 17 Dec 2016 08:35:05 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> References: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> Message-ID: <6ebdd77b-c576-fbee-903c-c365e101cbb4@pixitmedia.com> Hi Aaron An alternative method for you is: from arcapix.fs.gpfs import Nsds >>> from arcapix.fs.gpfs import Nsds >>> nsd = Nsds() >>> for n in nsd.values(): ... print n.device, n.id ... /gpfsblock/mmfs1-md1 md3200_001_L000 /gpfsblock/mmfs1-md2 md3200_001_L001 /gpfsblock/mmfs1-data1 md3200_001_L002 /gpfsblock/mmfs1-data2 md3200_001_L003 /gpfsblock/mmfs1-data3 md3200_001_L004 /gpfsblock/mmfs2-md1 md3200_002_L000 Ref: http://arcapix.com/gpfsapi/nsds.html Obviously you can filter a specific device by the usual Pythonic string comparators. Jez On 17/12/16 07:04, Luis Bolinches wrote: > Hi > THe ts* is a good fear, they are internal commands bla bla bla you > know that > Have you tried mmlsnsd -X > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 > > "If you continually give you will continually have." Anonymous > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] translating /dev device into nsd name > Date: Sat, Dec 17, 2016 6:24 AM > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am > afraid > that some day -D may be just that and *poof* there go my NSD > descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* VP of Research and Development, ArcaStream jtucker at arcastream.com www.arcastream.com | Tw:@arcastream.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Sat Dec 17 21:42:39 2016 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sat, 17 Dec 2016 16:42:39 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: <54420.1482010959@turing-police.cc.vt.edu> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From daniel.kidger at uk.ibm.com Mon Dec 19 11:42:03 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 19 Dec 2016 11:42:03 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discussUnless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Mon Dec 19 14:53:27 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Mon, 19 Dec 2016 14:53:27 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance Message-ID: We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of the IO servers phoned home with memory error. IBM is coming out today to replace the faulty DIMM. What is the correct way of taking this system out for maintenance? Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When we needed to do maintenance on the old system, we would migrate manager role and also move primary and secondary server roles if one of those systems had to be taken down. With ESS and resource pool manager roles etc. is there a correct way of shutting down one of the IO serves for maintenance? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 19 15:15:45 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 19 Dec 2016 10:15:45 -0500 Subject: [gpfsug-discuss] Tiers In-Reply-To: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Message-ID: We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Mark, > > We just use an 8 Gb FC SAN. For the data pool we typically have a dual > active-active controller storage array fronting two big RAID 6 LUNs and 1 > RAID 1 (for /home). For the capacity pool, it might be the same exact > model of controller, but the two controllers are now fronting that whole > 60-bay array. > > But our users tend to have more modest performance needs than most? > > Kevin > > On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: > > Kevin, out of curiosity, what type of disk does your data pool use? SAS > or just some SAN attached system? > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Thursday, December 15, 2016 at 2:47 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Tiers > > Hi Mark, > > We?re a ?traditional? university HPC center with a very untraditional > policy on our scratch filesystem ? we don?t purge it and we sell quota > there. Ultimately, a lot of that disk space is taken up by stuff that, > let?s just say, isn?t exactly in active use. > > So what we?ve done, for example, is buy a 60-bay storage array and stuff > it with 8 TB drives. It wouldn?t offer good enough performance for > actively used files, but we use GPFS policies to migrate files to the > ?capacity? pool based on file atime. So we have 3 pools: > > 1. the system pool with metadata only (on SSDs) > 2. the data pool, which is where actively used files are stored and which > offers decent performance > 3. the capacity pool, for data which hasn?t been accessed ?recently?, and > which is on slower storage > > I would imagine others do similar things. HTHAL? > > Kevin > > > On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: > > Just curious how many of you out there deploy SS with various tiers? It > seems like a lot are doing the system pool with SSD?s but do you routinely > have clusters that have more than system pool and one more tier? > > I know if you are doing Archive in connection that?s an obvious choice for > another tier but I?m struggling with knowing why someone needs more than > two tiers really. > > I?ve read all the fine manuals as to how to do such a thing and some of > the marketing as to maybe why. I?m still scratching my head on this > though. In fact, my understanding is in the ESS there isn?t any different > pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). > > It does make sense to me know with TCT and I could create an ILM policy to > get some of my data into the cloud. > > But in the real world I would like to know what yall do in this regard. > > > Thanks > > Mark > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > *Sirius Computer Solutions * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 19 15:25:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 19 Dec 2016 15:25:52 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Mon Dec 19 15:30:58 2016 From: kenh at us.ibm.com (Ken Hill) Date: Mon, 19 Dec 2016 10:30:58 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" To: "gpfsug main discussion list" Cc: "gpfsug main discussion list" Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 15:36:50 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 15:36:50 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:40:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:40:50 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: References: Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E@vanderbilt.edu> Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 15:53:12 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 15:53:12 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Move its recoverygrops to the other node by putting the other node as primary server for it: mmchrecoverygroup rgname --servers otherServer,thisServer And verify that it's now active on the other node by "mmlsrecoverygroup rgname -L". Move away any filesystem managers or cluster manager role if that's active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. Then you can run mmshutdown on it (assuming you also have enough quorum nodes in the remaining cluster). -jf man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:58:16 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:58:16 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpappas at dstonline.com Mon Dec 19 15:59:12 2016 From: bpappas at dstonline.com (Bill Pappas) Date: Mon, 19 Dec 2016 15:59:12 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] SMB issues Message-ID: Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 16:06:08 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 16:06:08 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: We see it on all four of the nodes, and yet we did some getent passwd/getent group stuff on them to verify that identity is working OK. Simon From: > on behalf of Bill Pappas > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 19 December 2016 at 15:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SMB issues What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of gpfsug-discuss-request at spectrumscale.org > Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SMB issues Message-ID: > Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg URL: From ulmer at ulmer.org Mon Dec 19 16:16:56 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 11:16:56 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen > On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: > > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > >> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >> >> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >> >> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >> >> >> Ken Hill >> Technical Sales Specialist | Software Defined Solution Sales >> IBM Systems >> Phone:1-540-207-7270 >> E-mail: kenh at us.ibm.com >> >> >> 2300 Dulles Station Blvd >> Herndon, VA 20171-6133 >> United States >> >> >> >> >> >> >> >> >> >> >> From: "Daniel Kidger" > >> To: "gpfsug main discussion list" > >> Cc: "gpfsug main discussion list" > >> Date: 12/19/2016 06:42 AM >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Valdis wrote: >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> >> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >> >> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >> >> Daniel >> >> IBM Spectrum Storage Software >> +44 (0)7818 522266 >> Sent from my iPad using IBM Verse >> >> >> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >> >> From: Valdis.Kletnieks at vt.edu >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Date: 17 Dec 2016 21:43:00 >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> >> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >> > that I can then parse and map the nsd id to the nsd name. I hesitate >> > calling ts* commands directly and I admit it's perhaps an irrational >> > fear, but I associate the -D flag with "delete" in my head and am afraid >> > that some day -D may be just that and *poof* there go my NSD descriptors. >> Others have mentioned mmlsdnsd -m and -X >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 16:25:50 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 16:25:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: I normally do mmcrnsd without specifying any servers=, and point at the local /dev entry. Afterwards I add the servers= line and do mmchnsd. -jf man. 19. des. 2016 kl. 16.58 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those > NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen > server, but as long as you have the proper device name for the NSD from the > NSD server you want to be primary for it, I?ve never had a problem > specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > > On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: > > Indeed. It only matters when deploying NSDs. Post-deployment, all luns > (NSDs) are labeled - and they are assembled by GPFS. > > Keep in mind: If you are deploying multiple NSDs (with multiple servers) - > you'll need to pick one server to work with... Use that server to label the > luns (mmcrnsd)... In the nsd stanza file - the server you choose will need > to be the first server in the "servers" list. > > > *Ken Hill* > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > > ------------------------------ > *Phone:*1-540-207-7270 > * E-mail:* *kenh at us.ibm.com* > > > > > > > > > > > > > > > > > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > > > > > > From: "Daniel Kidger" > To: "gpfsug main discussion list" > > Cc: "gpfsug main discussion list" > > Date: 12/19/2016 06:42 AM > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > *Valdis wrote:* > > > > *Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across the > servers, or across reboots. And when multipath is involved, you may have 4 > or 8 or even more names for the same device....* > > Indeed the is whole greatness about NSDs (and in passing why Lustre can be > much more tricky to safely manage.) > Once a lun is "labelled" as an NSD then that NSD name is all you need to > care about as the /dev entries can now freely change on reboot or differ > across nodes. Indeed if you connect an arbitrary node to an NSD disk via a > SAN cable, gpfs will recognise it and use it as a shortcut to that lun. > > Finally recall that in the NSD stanza file the /dev entry is only matched > for on the first of the listed NSD servers; the other NSD servers will > discover and learn which NSD this is, ignoring the /dev value in this > stanza. > > Daniel > > IBM Spectrum Storage Software > *+44 (0)7818 522266* <+44%207818%20522266> > Sent from my iPad using IBM Verse > > > ------------------------------ > On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: > > From: Valdis.Kletnieks at vt.edu > To: gpfsug-discuss at spectrumscale.org > Cc: > Date: 17 Dec 2016 21:43:00 > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > > On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > > that I can then parse and map the nsd id to the nsd name. I hesitate > > calling ts* commands directly and I admit it's perhaps an irrational > > fear, but I associate the -D flag with "delete" in my head and am afraid > > that some day -D may be just that and *poof* there go my NSD descriptors. > Others have mentioned mmlsdnsd -m and -X > Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across > the servers, or across reboots. And when multipath is involved, you may > have 4 or 8 or even more names for the same device.... > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 16:43:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 16:43:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Message-ID: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Hi Stephen, Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: 1. go down to the data center and sit in front of the storage arrays. 2. log on to the NSD server I want to be primary for a given NSD. 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. 3. for the remaining disks, run ?dd if=/dev/> wrote: Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Dec 19 16:45:38 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 19 Dec 2016 16:45:38 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Can you create an export with "admin user" and see if the issue is reproducible that way: Mmsmb export add exportname /path/to/folder Mmsmb export change exportname -option "admin users=username at domain" And for good measure remove the SID of Domain Users from the ACL: mmsmb exportacl remove exportname --SID S-1-1-0 I can't quite think in my head how this will help but I'd be interested to know if you see similar behaviour. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 December 2016 15:37 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SMB issues Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Mon Dec 19 17:08:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 12:08:27 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <14903A9D-B051-4B1A-AF83-31140FC7666D@ulmer.org> Depending on the hardware?. ;) Sometimes you can use the drivers to tell you the ?volume name? of a LUN on the storage server. You could do that the DS{3,4,5}xx systems. I think you can also do it for Storwize-type systems, but I?m blocking on how and I don?t have one in front of me at the moment. Either that or use the volume UUID or some such. I?m basically never where I can see the blinky lights. :( -- Stephen > On Dec 19, 2016, at 11:43 AM, Buterbaugh, Kevin L > wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer > wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" > >>>> To: "gpfsug main discussion list" > >>>> Cc: "gpfsug main discussion list" > >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Dec 19 17:16:07 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 19 Dec 2016 09:16:07 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: We have each of our NSDs on boxes shared between two servers, with one server primary for each raid unit. When I create Logical drives and map them, I make sure there is no overlap in the logical unit numbers between the two boxes. Then I use /proc/partitions and lsscsi to see if they all show up. When it is time to write the stanza files, I use multipath -ll to get a list with the device name and LUN info, and sort it to round robin over all the NSD servers. It's still tedious, but it doesn't require a trip to the machine room. Note that the multipath -ll command needs to be run separately on each NSD server to get the device name specific to that host -- the first server name in the list. Also realize that leaving the host name off when creating NSDs only works if all the drives are visible from the node where you run the command. Regards, -- ddj Dave Johnson > On Dec 19, 2016, at 8:43 AM, Buterbaugh, Kevin L wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" >>>> To: "gpfsug main discussion list" >>>> Cc: "gpfsug main discussion list" >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Dec 19 17:31:38 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 19 Dec 2016 10:31:38 -0700 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tortay at cc.in2p3.fr Mon Dec 19 17:49:05 2016 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Mon, 19 Dec 2016 18:49:05 +0100 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <5cca2ea8-b098-c1e4-ab03-9542837287ab@cc.in2p3.fr> On 12/19/2016 05:43 PM, Buterbaugh, Kevin L wrote: > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > Hello, We use device mapper/multipath to assign meaningful names to devices based on the WWN (or the storage system "volume" name) of the LUNs. We use a simple naming scheme ("nsdDDNN", where DD is the primary server number and NN the NSD number for that node, of course all NSDs are served by at least 2 nodes). When possible, these names are also used by the storage systems (nowadays mostly LSI/Netapp units). We have scripts to automate the configuration of the LUNs on the storage systems with the proper names as well as for creating the relevant section of "multipath.conf". There is no ambiguity during "mmcrnsd" (or no need to use "mmchnsd" later on) and it's also easy to know which filesystem or pool is at risk when some hardware fails (CMDB, etc.) Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From mimarsh2 at vt.edu Tue Dec 20 13:57:31 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 08:57:31 -0500 Subject: [gpfsug-discuss] mmlsdisk performance impact Message-ID: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 14:03:07 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 14:03:07 +0000 Subject: [gpfsug-discuss] mmlsdisk performance impact In-Reply-To: References: Message-ID: Hi Brian, If I?m not mistaken, once you run the mmlsdisk command on one client any other client running it will produce the exact same output. Therefore, what we do is run it once, output that to a file, and propagate that file to any node that needs it. HTHAL? Kevin On Dec 20, 2016, at 7:57 AM, Brian Marshall > wrote: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 16:25:04 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 11:25:04 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process Message-ID: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 16:27:32 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 08:27:32 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: Message-ID: <20161220162732.GB20276@illiuin> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 16:35:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 10:35:44 -0600 Subject: [gpfsug-discuss] LROC Message-ID: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 16:37:54 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 16:37:54 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <20161220162732.GB20276@illiuin> References: <20161220162732.GB20276@illiuin> Message-ID: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 20 17:03:28 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 20 Dec 2016 17:03:28 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:07:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:07:17 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Brian, > > It would be helpful to know what scheduling software, if any, you use. > > We were a PBS / Moab shop for a number of years but switched to SLURM two > years ago. With both you can configure the maximum amount of memory > available to all jobs on a node. So we just simply ?reserve? however much > we need for GPFS and other ?system? processes. > > I can tell you that SLURM is *much* more efficient at killing processes as > soon as they exceed the amount of memory they?ve requested than PBS / Moab > ever dreamed of being. > > Kevin > > On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: > > We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > process memory > usage. In the GE exec host configuration, we reserve 4GB for the OS > (including GPFS) so jobs are not able to consume all the physical memory on > the system. > > On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 20 17:13:48 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 20 Dec 2016 17:13:48 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 17:15:02 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 17:15:02 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <818353BF-18AC-4931-8890-35D6ECC4DF04@vanderbilt.edu> Hi Brian, I don?t *think* you can entirely solve this problem with Moab ? as I mentioned, it?s not nearly as efficient as SLURM is at killing jobs when they exceed requested memory. We had situations where a user would be able to run a node out of memory before Moab would kill it. Hasn?t happened once with SLURM, AFAIK. But with either Moab or SLURM what we?ve done is taken the amount of physical RAM in the box and subtracted from that the amount of memory we want to ?reserve? for the system (OS, GPFS, etc.) and then told Moab / SLURM that this is how much RAM the box has. That way they at least won?t schedule jobs on the node that would exceed available memory. HTH? Kevin On Dec 20, 2016, at 11:07 AM, Brian Marshall > wrote: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" > wrote: Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:15:23 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:15:23 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: Skyrim equals Slurm. Mobile shenanigans. Brian On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > We use adaptive - Moab torque right now but are thinking about going to > Skyrim > > Brian > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > Kevin.Buterbaugh at vanderbilt.edu> wrote: > >> Hi Brian, >> >> It would be helpful to know what scheduling software, if any, you use. >> >> We were a PBS / Moab shop for a number of years but switched to SLURM two >> years ago. With both you can configure the maximum amount of memory >> available to all jobs on a node. So we just simply ?reserve? however much >> we need for GPFS and other ?system? processes. >> >> I can tell you that SLURM is *much* more efficient at killing processes >> as soon as they exceed the amount of memory they?ve requested than PBS / >> Moab ever dreamed of being. >> >> Kevin >> >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson >> wrote: >> >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user >> process memory >> usage. In the GE exec host configuration, we reserve 4GB for the OS >> (including GPFS) so jobs are not able to consume all the physical memory >> on >> the system. >> >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: >> >> All, >> >> What is your favorite method for stopping a user process from eating up >> all >> the system memory and saving 1 GB (or more) for the GPFS / system >> processes? We have always kicked around the idea of cgroups but never >> moved on it. >> >> The problem: A user launches a job which uses all the memory on a node, >> which causes the node to be expelled, which causes brief filesystem >> slowness everywhere. >> >> I bet this problem has already been solved and I am just googling the >> wrong >> search terms. >> >> >> Thanks, >> Brian >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> -- >> -- Skylar Thompson (skylar2 at u.washington.edu) >> -- Genome Sciences Department, System Administrator >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> >> -- University of Washington School of Medicine >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Dec 20 17:19:48 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 20 Dec 2016 17:19:48 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: For sake of everyone else on this listserv, I'll highlight the appropriate procedure here. It turns out, changing recovery group on an active system is not recommended by IBM. We tried following Jan's recommendation this morning, and the system became unresponsive for about 30 minutes. It only became responsive (and recovery group change finished) after we killed couple of processes (ssh and scp) going to couple of clients. I got a Sev. 1 with IBM opened and they tell me that appropriate steps for IO maintenance are as follows: 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) 2. unmount gpfs on io node that is going down 3. shutdown gpfs on io node that is going down 4. shutdown os That's it - recovery groups should not be changed. If there is a need to change recovery group, use --active option (not permanent change). We are now stuck in situation that io2 server is owner of both recovery groups. The way IBM tells us to fix this is to unmount the filesystem on all clients and change recovery groups then. We can't do it now and will have to schedule maintenance sometime in 2017. For now, we have switched recovery groups using --active flag and things (filesystem performance) seems to be OK. Load average on both io servers is quite high (250avg) and does not seem to be going down. I really wish that maintenance procedures were documented somewhere on IBM website. This experience this morning has really shaken my confidence in ESS. Damir On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust wrote: > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 17:18:35 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 09:18:35 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <20161220171834.GE20276@illiuin> When using m_mem_free on GE with cgroup=true, GE just depends on the kernel OOM killer. There's one killer per cgroup so when a job goes off the rails, only its processes are eligible for OOM killing. I'm not sure how Slurm does it but anything that uses cgroups should have the above behavior. On Tue, Dec 20, 2016 at 12:15:23PM -0500, Brian Marshall wrote: > Skyrim equals Slurm. Mobile shenanigans. > > Brian > > On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > > > We use adaptive - Moab torque right now but are thinking about going to > > Skyrim > > > > Brian > > > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > > Kevin.Buterbaugh at vanderbilt.edu> wrote: > > > >> Hi Brian, > >> > >> It would be helpful to know what scheduling software, if any, you use. > >> > >> We were a PBS / Moab shop for a number of years but switched to SLURM two > >> years ago. With both you can configure the maximum amount of memory > >> available to all jobs on a node. So we just simply ???reserve??? however much > >> we need for GPFS and other ???system??? processes. > >> > >> I can tell you that SLURM is *much* more efficient at killing processes > >> as soon as they exceed the amount of memory they???ve requested than PBS / > >> Moab ever dreamed of being. > >> > >> Kevin > >> > >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson > >> wrote: > >> > >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > >> process memory > >> usage. In the GE exec host configuration, we reserve 4GB for the OS > >> (including GPFS) so jobs are not able to consume all the physical memory > >> on > >> the system. > >> > >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > >> > >> All, > >> > >> What is your favorite method for stopping a user process from eating up > >> all > >> the system memory and saving 1 GB (or more) for the GPFS / system > >> processes? We have always kicked around the idea of cgroups but never > >> moved on it. > >> > >> The problem: A user launches a job which uses all the memory on a node, > >> which causes the node to be expelled, which causes brief filesystem > >> slowness everywhere. > >> > >> I bet this problem has already been solved and I am just googling the > >> wrong > >> search terms. > >> > >> > >> Thanks, > >> Brian > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> -- > >> -- Skylar Thompson (skylar2 at u.washington.edu) > >> -- Genome Sciences Department, System Administrator > >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > >> -- University of Washington School of Medicine > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> ??? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > >> Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 19:13:46 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 13:13:46 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more > metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Dec 20 19:18:47 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 20 Dec 2016 19:18:47 +0000 Subject: [gpfsug-discuss] LROC Message-ID: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSize configuration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Dec 20 19:36:08 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 20 Dec 2016 20:36:08 +0100 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: I'm sorry for your trouble, but those 4 steps you got from IBM support does not seem correct. IBM support might not always realize that it's an ESS, and not plain GPFS... If you take down an ESS IO-node without moving its RG to the other node using "--servers othernode,thisnode", or by using --active (which I've never used), you'll take down the whole recoverygroup and need to suffer an uncontrolled failover. Such an uncontrolled failover takes a few minutes of filesystem hang, while a controlled failover should not hang the system. I don't see why it's a problem that you now have an IO server that is owning both recoverygroups. Once your maintenance of the first IO servers is done, I would just revert the --servers order of that recovergroup, and it should move back. The procedure to move RGs around during IO node maintenance is documented on page 10 the quick deployment guide (step 1-3): http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc -jf On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic wrote: > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > >> >> Move its recoverygrops to the other node by putting the other node as >> primary server for it: >> >> mmchrecoverygroup rgname --servers otherServer,thisServer >> >> And verify that it's now active on the other node by "mmlsrecoverygroup >> rgname -L". >> >> Move away any filesystem managers or cluster manager role if that's >> active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. >> >> Then you can run mmshutdown on it (assuming you also have enough quorum >> nodes in the remaining cluster). >> >> >> -jf >> >> man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : >> >> We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of >> the IO servers phoned home with memory error. IBM is coming out today to >> replace the faulty DIMM. >> >> What is the correct way of taking this system out for maintenance? >> >> Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When >> we needed to do maintenance on the old system, we would migrate manager >> role and also move primary and secondary server roles if one of those >> systems had to be taken down. >> >> With ESS and resource pool manager roles etc. is there a correct way of >> shutting down one of the IO serves for maintenance? >> >> Thanks, >> Damir >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Dec 20 20:30:04 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 21:30:04 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 20 20:44:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 14:44:44 -0600 Subject: [gpfsug-discuss] CES ifs-ganashe Message-ID: Does ganashe have a default read and write max size? if so what is it? Thanks Matt From olaf.weiser at de.ibm.com Tue Dec 20 21:06:44 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 22:06:44 +0100 Subject: [gpfsug-discuss] CES ifs-ganashe In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Tue Dec 20 23:25:41 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 21 Dec 2016 08:25:41 +0900 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: I still see the following statement* regarding with the use of LROC in FAQ (URL #1). Are there any issues anticipated to use LROC on protocol nodes? Q8.3: What are some configuration considerations when deploying the protocol functionality? A8.3: Configuration considerations include: (... many lines are snipped ...) Several GPFS configuration aspects have not been explicitly tested with the protocol function: (... many lines are snipped ...) Local Read Only Cache* (... many lines are snipped ...) Q2.25: What are the current requirements when using local read-only cache? A2.25: The current requirements/limitations for using local read-only cache include: - A minimum of IBM Spectrum Scale V4.1.0.1. - Local read-only cache is only supported on Linux x86 and Power. - The minimum size of a local read-only cache device is 4 GB. - The local read-only cache requires memory equal to 1% of the local read-only device's capacity. Note: Use of local read-only cache does not require a server license [1] IBM Spectrum Scale? Frequently Asked Questions and Answers (November 2016) https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: "Olaf Weiser" To: gpfsug main discussion list Date: 2016/12/21 05:31 Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org it's all true and right, but please have in mind.. with MFTC and the number of nodes in the ( remote and local ) cluster, you 'll need token mem since R42 token Mem is allocated automatically .. so the old tokenMEMLimit is more or less obsolete.. but you should have your overall configuration in mind, when raising MFTC clusterwide... just a hint.. have fun... Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/20/2016 08:19 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSizeconfiguration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Dec 21 09:23:16 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 09:23:16 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil wrote: > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 21 09:42:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Dec 2016 09:42:36 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Ooh, LROC sensors for Zimon? must look into that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: 21 December 2016 09:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:29:04 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:29:04 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:37:46 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:37:46 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:48:24 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:48:24 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:57:39 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:57:39 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as > its the only place under Linux its used and should be set low on non-LROC > enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use > LROC on the same node. LROC uses the compact object (StatCache) to store > its pointer to the full file Object which is stored on the LROC device. so > on a call for attributes that are not in the StatCache the object gets > recalled from LROC and converted back into a full File Object, which is why > you still need to have a reasonable maxFiles setting even you use LROC as > you otherwise constantly move file infos in and out of LROC and put the > device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs p.childs at qmul.ac.uk>> wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org> < > gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:12:22 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:12:22 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jez.tucker at gpfsug.org Wed Dec 21 12:16:03 2016 From: jez.tucker at gpfsug.org (Jez Tucker) Date: Wed, 21 Dec 2016 12:16:03 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: > > Does anyone know when the presentations from the last users group > meeting will be posted. I checked last night but there doesn?t seem > to be any new ones out there (summaries of talks yet). > > Thanks > > Mark > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, > and exempt from disclosure under applicable law. If you are not the > intended recipient, you are hereby notified that any use, > dissemination, distribution, or copying of this communication is > strictly prohibited. This message may be viewed by parties at Sirius > Computer Solutions other than those named in the message header. This > message does not contain an official representation of Sirius Computer > Solutions. If you have received this communication in error, notify > Sirius Computer Solutions immediately and (i) destroy this message if > a facsimile or (ii) delete this message immediately if this is an > electronic communication. Thank you. > > Sirius Computer Solutions > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:24:34 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:24:34 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Message-ID: Yes From: Jez Tucker Reply-To: "jez.tucker at gpfsug.org" , gpfsug main discussion list Date: Wednesday, December 21, 2016 at 6:16 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Presentations from last UG Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Wed Dec 21 12:46:42 2016 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Wed, 21 Dec 2016 12:46:42 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: Checking... Kristy On Dec 21, 2016 7:12 AM, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 21 13:42:02 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 21 Dec 2016 13:42:02 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 14:37:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 14:37:58 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Thanks much, Simon. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Wednesday, December 21, 2016 at 7:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Presentations from last UG Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Dec 21 15:17:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 21 Dec 2016 10:17:27 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen > On Dec 21, 2016, at 6:57 AM, Sven Oehme > wrote: > > its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). > on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . > > Sven > > On Wed, Dec 21, 2016 at 12:48 PM Peter Childs > wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs >> wrote: > My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > >> on behalf of Sven Oehme >> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. > therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >>>> wrote: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Dec 21 15:39:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 21 Dec 2016 16:39:16 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: close, but not 100% :-) LROC only needs a StatCache Object for files that don't have a Full OpenFile (maxFilestoCache) Object and you still want to be able to hold Metadata and/or Data in LROC. e.g. you can have a OpenFile instance that has Data blocks in LROC, but no Metadata (as everything is in the OpenFile Object itself), then you don't need a maxStatCache Object for this one. but you would need a StatCache object if we have to throw this file metadata or data out of the FileCache and/or Pagepool as we would otherwise loose all references to that file in LROC. the MaxStat Object is the most compact form to hold only references to the real data. if its still unclear we might have to do a small writeup in form of a paper with a diagram to better explain it, but that would take a while due to a lot of other work ahead of that :-) sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Stephen Ulmer To: gpfsug main discussion list Date: 12/21/2016 04:17 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen On Dec 21, 2016, at 6:57 AM, Sven Oehme wrote: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < oehmes at gmail.com> Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Flash%20Storage < https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Wed Dec 21 16:03:44 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 21 Dec 2016 16:03:44 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Hi Jan, I am sorry if my post sounded accusatory - I did not mean it that way. We had a very frustrating experience trying to change recoverygroup yesterday morning. I've read the manual you have linked and indeed, you have outlined the correct procedure. I am left wondering why the level 2 gpfs support instructed us not to do that in the future. Their support instructions are contradicting what's in the manual. We are running now with the --active recovery group in place and will change it permanently back to the default setting early in the new year. Anyway, thanks for your help. Damir On Tue, Dec 20, 2016 at 1:36 PM Jan-Frode Myklebust wrote: > I'm sorry for your trouble, but those 4 steps you got from IBM support > does not seem correct. IBM support might not always realize that it's an > ESS, and not plain GPFS... If you take down an ESS IO-node without moving > its RG to the other node using "--servers othernode,thisnode", or by using > --active (which I've never used), you'll take down the whole recoverygroup > and need to suffer an uncontrolled failover. Such an uncontrolled failover > takes a few minutes of filesystem hang, while a controlled failover should > not hang the system. > > I don't see why it's a problem that you now have an IO server that is > owning both recoverygroups. Once your maintenance of the first IO servers > is done, I would just revert the --servers order of that recovergroup, and > it should move back. > > The procedure to move RGs around during IO node maintenance is documented > on page 10 the quick deployment guide (step 1-3): > > > http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc > > > -jf > > > On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic > wrote: > > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Dec 21 21:55:51 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 21 Dec 2016 21:55:51 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down formaintenance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 16:44:26 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 10:44:26 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: This is enabled on this node but mmdiag it does not seem to show it caching. Did I miss something? I do have one file system in the cluster that is running 3.5.0.7 wondering if that is causing this. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): 'NULL' status Idle > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Tue Dec 27 11:21:14 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) From aaron.s.knister at nasa.gov Wed Dec 28 17:50:35 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 12:50:35 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Hey Matt, We ran into a similar thing and if I recall correctly a mmchconfig --release=LATEST was required to get LROC working which, of course, would boot your 3.5.0.7 client from the cluster. -Aaron On 12/28/16 11:44 AM, Matt Weil wrote: > This is enabled on this node but mmdiag it does not seem to show it > caching. Did I miss something? I do have one file system in the > cluster that is running 3.5.0.7 wondering if that is causing this. >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): 'NULL' status Idle >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Tue Dec 27 11:21:14 2016 >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Wed Dec 28 18:02:27 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 12:02:27 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Message-ID: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From oehmes at us.ibm.com Wed Dec 28 19:06:19 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 20:06:19 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Wed Dec 28 19:52:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:52:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> k got that fixed now shows as status shutdown > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): > '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' > status Shutdown > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Wed Dec 28 19:55:18 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 19:55:18 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> Message-ID: Did you restart the daemon on that node after you fixed it ? Sent from IBM Verse Matt Weil --- Re: [gpfsug-discuss] LROC --- From:"Matt Weil" To:gpfsug-discuss at spectrumscale.orgDate:Wed, Dec 28, 2016 8:52 PMSubject:Re: [gpfsug-discuss] LROC k got that fixed now shows as status shutdown [root at ces1 ~]# mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' status Shutdown Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile 1073741824 Max capacity: 0 MB, currently in use: 0 MB Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Matt Weil ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 19:57:18 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:57:18 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> no I will do that next. On 12/28/16 1:55 PM, Sven Oehme wrote: > Did you restart the daemon on that node after you fixed it ? Sent from > IBM Verse > > Matt Weil --- Re: [gpfsug-discuss] LROC --- > > From: "Matt Weil" > To: gpfsug-discuss at spectrumscale.org > Date: Wed, Dec 28, 2016 8:52 PM > Subject: Re: [gpfsug-discuss] LROC > > ------------------------------------------------------------------------ > > k got that fixed now shows as status shutdown > >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): >> '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' >> status Shutdown >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Wed Dec 28 13:49:27 2016 > > > > On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 20:15:14 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 14:15:14 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> References: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> Message-ID: <5127934a-b6b6-c542-f50a-67c47fe6d6db@wustl.edu> still in a 'status Shutdown' even after gpfs was stopped and started. From aaron.s.knister at nasa.gov Wed Dec 28 22:16:00 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:16:00 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Anything interesting in the mmfs log? On a related note I'm curious how a 3.5 client is able to join a cluster with a minreleaselevel of 4.1.1.0. From: Matt Weil Sent: 12/28/16, 3:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC still in a 'status Shutdown' even after gpfs was stopped and started. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:21:21 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:21:21 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Message-ID: <59fa3ab8-a666-d29c-117d-9db515f566e8@wustl.edu> yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version From aaron.s.knister at nasa.gov Wed Dec 28 22:26:46 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:26:46 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Ouch...to quote Adam Savage "well there's yer problem". Are you perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like there was an LROC related assert fixed in 4.1.1.9 but I can't find details on it. From: Matt Weil Sent: 12/28/16, 5:21 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:39:19 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:39:19 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Message-ID: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> > mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.1.2 ". > Built on Oct 27 2016 at 10:52:12 > Running 13 minutes 54 secs, pid 13229 On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > Ouch...to quote Adam Savage "well there's yer problem". Are you > perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like > there was an LROC related assert fixed in 4.1.1.9 but I can't find > details on it. > > > > *From:*Matt Weil > *Sent:* 12/28/16, 5:21 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] LROC > > yes > > > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > > ssdActive) in line 427 of file > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > > logAssertFailed + 0x2D5 at ??:0 > > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > > fs_config_ssds(fs_config*) + 0x867 at ??:0 > > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > > SFSConfigLROC() + 0x189 at ??:0 > > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > > runTSControl(int, int, char**) + 0x80E at ??:0 > > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > > HandleCmdMsg(void*) + 0x1216 at ??:0 > > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > > Thread::callBody(Thread*) + 0x1E2 at ??:0 > > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > > start_thread + 0xC5 at ??:0 > > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > > 0x6D at ??:0 > > mmfsd: > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > > failed. > > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > > 0x00007FF15FD71000 > > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > > 0x0000000000000006 > > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > > 0x00007FF15E8D03A8 > > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > > 0x000000000001E9A1 > > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > > 0xFF092D63646B6860 > > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > > 0x0000000000000202 > > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > > 0x00007FF161032EC0 > > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > > 0x0000000000000000 > > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > > 0x0000000000000202 > > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > > 0x0000000000000000 > > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > > 0x0000000010017807 > > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > > at ??:0 > > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > > __assert_fail_base + 126 at ??:0 > > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > > __GI___assert_fail + 42 at ??:0 > > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > > 2F9 at ??:0 > > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > > fs_config_ssds(fs_config*) + 867 at ??:0 > > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > > 189 at ??:0 > > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 771 at ??:0 > > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > > int, char**) + 80E at ??:0 > > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > > HandleCmdMsg(void*) + 1216 at ??:0 > > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > > Thread::callBody(Thread*) + 1E2 at ??:0 > > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > > C5 at ??:0 > > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D > at ??:0 > > > On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > > related note I'm curious how a 3.5 client is able to join a cluster > > with a minreleaselevel of 4.1.1.0. > I was referring to the fs version not the gpfs client version sorry for > that confusion > -V 13.23 (3.5.0.7) File system version > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 28 23:19:52 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 18:19:52 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> Message-ID: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Interesting. Would you be willing to post the output of "mmlssnsd -X | grep 0A6403AA58641546" from the troublesome node as suggested by Sven? On 12/28/16 5:39 PM, Matt Weil wrote: > >> mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.1.2 ". >> Built on Oct 27 2016 at 10:52:12 >> Running 13 minutes 54 secs, pid 13229 > > On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: >> Ouch...to quote Adam Savage "well there's yer problem". Are you >> perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like >> there was an LROC related assert fixed in 4.1.1.9 but I can't find >> details on it. >> >> >> >> *From:*Matt Weil >> *Sent:* 12/28/16, 5:21 PM >> *To:* gpfsug main discussion list >> *Subject:* Re: [gpfsug-discuss] LROC >> >> yes >> >> > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != >> > ssdActive) in line 427 of file >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C >> > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: >> > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 >> > logAssertFailed + 0x2D5 at ??:0 >> > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 >> > fs_config_ssds(fs_config*) + 0x867 at ??:0 >> > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 >> > SFSConfigLROC() + 0x189 at ??:0 >> > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 >> > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 >> > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E >> > runTSControl(int, int, char**) + 0x80E at ??:0 >> > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 >> > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 >> > HandleCmdMsg(void*) + 0x1216 at ??:0 >> > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 >> > Thread::callBody(Thread*) + 0x1E2 at ??:0 >> > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 >> > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 >> > start_thread + 0xC5 at ??:0 >> > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + >> > 0x6D at ??:0 >> > mmfsd: >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: >> > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' >> > failed. >> > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 >> > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. >> > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx >> > 0x00007FF15FD71000 >> > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx >> > 0x0000000000000006 >> > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp >> > 0x00007FF15E8D03A8 >> > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi >> > 0x000000000001E9A1 >> > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 >> > 0xFF092D63646B6860 >> > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 >> > 0x00007FF161032EC0 >> > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk >> > 0x0000000010017807 >> > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 >> > Wed Dec 28 16:17:09.022 2016: [D] Traceback: >> > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 >> > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 >> > at ??:0 >> > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 >> > __assert_fail_base + 126 at ??:0 >> > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 >> > __GI___assert_fail + 42 at ??:0 >> > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + >> > 2F9 at ??:0 >> > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 >> > fs_config_ssds(fs_config*) + 867 at ??:0 >> > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + >> > 189 at ??:0 >> > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 >> > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 771 at ??:0 >> > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, >> > int, char**) + 80E at ??:0 >> > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 >> > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 >> > HandleCmdMsg(void*) + 1216 at ??:0 >> > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 >> > Thread::callBody(Thread*) + 1E2 at ??:0 >> > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 >> > Thread::callBodyWrapper(Thread*) + A2 at ??:0 >> > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + >> > C5 at ??:0 >> > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D >> at ??:0 >> >> >> On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE >> CORP] wrote: >> > related note I'm curious how a 3.5 client is able to join a cluster >> > with a minreleaselevel of 4.1.1.0. >> I was referring to the fs version not the gpfs client version sorry for >> that confusion >> -V 13.23 (3.5.0.7) File system version >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Thu Dec 29 15:57:40 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 09:57:40 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: > ro_cache_S29GNYAH200016 0A6403AA586531E1 > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > dmm ces1.gsc.wustl.edu server node On 12/28/16 5:19 PM, Aaron Knister wrote: > mmlssnsd -X | grep 0A6403AA58641546 From aaron.s.knister at nasa.gov Thu Dec 29 16:02:44 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 29 Dec 2016 11:02:44 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. That's a *really* long device path (and nested too), I wonder if that's causing issues. What does a "tspreparedisk -S" show on that node? Also, what does your nsddevices script look like? I'm wondering if you could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" paths if that would help things here. -Aaron On 12/29/16 10:57 AM, Matt Weil wrote: > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> dmm ces1.gsc.wustl.edu server node > > > On 12/28/16 5:19 PM, Aaron Knister wrote: >> mmlssnsd -X | grep 0A6403AA58641546 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Thu Dec 29 16:09:58 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 16:09:58 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: i agree that is a very long name , given this is a nvme device it should show up as /dev/nvmeXYZ i suggest to report exactly that in nsddevices and retry. i vaguely remember we have some fixed length device name limitation , but i don't remember what the length is, so this would be my first guess too that the long name is causing trouble. On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:10:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:10:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:02 AM, Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if > that's causing issues. was thinking of trying just /dev/sdxx > > What does a "tspreparedisk -S" show on that node? tspreparedisk:0::::0:0:: > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" paths if that would help things here. > if [[ $osName = Linux ]] > then > : # Add function to discover disks in the Linux environment. > for luns in `ls /dev/disk/by-id | grep nvme` > do > all_luns=disk/by-id/$luns > echo $all_luns dmm > done > > fi > I will try that. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: >> >> >>> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >>> dmm ces1.gsc.wustl.edu server node >> >> >> On 12/28/16 5:19 PM, Aaron Knister wrote: >>> mmlssnsd -X | grep 0A6403AA58641546 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From mweil at wustl.edu Thu Dec 29 16:18:30 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:18:30 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:09 AM, Sven Oehme wrote: > i agree that is a very long name , given this is a nvme device it > should show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , > but i don't remember what the length is, so this would be my first > guess too that the long name is causing trouble. I will try that. I was attempting to not need to write a custom udev rule for those. Also to keep the names persistent. Rhel 7 has a default rule that makes a sym link in /dev/disk/by-id. 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> ../../nvme0n1 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> ../../nvme1n1 > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws > here. > > That's a *really* long device path (and nested too), I wonder if > that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu > server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:28:32 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:28:32 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> wow that was it. > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 It is not caching however. I will restart gpfs to see if that makes it start working. On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: >> i agree that is a very long name , given this is a nvme device it >> should show up as /dev/nvmeXYZ >> i suggest to report exactly that in nsddevices and retry. >> i vaguely remember we have some fixed length device name limitation , >> but i don't remember what the length is, so this would be my first >> guess too that the long name is causing trouble. > I will try that. I was attempting to not need to write a custom udev > rule for those. Also to keep the names persistent. Rhel 7 has a > default rule that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 >> >> >> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >> > wrote: >> >> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws >> here. >> >> That's a *really* long device path (and nested too), I wonder if >> that's >> causing issues. >> >> What does a "tspreparedisk -S" show on that node? >> >> Also, what does your nsddevices script look like? I'm wondering >> if you >> could have it give back "/dev/dm-XXX" paths instead of >> "/dev/disk/by-id" >> paths if that would help things here. >> >> -Aaron >> >> On 12/29/16 10:57 AM, Matt Weil wrote: >> > >> > >> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> >> >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> >> dmm ces1.gsc.wustl.edu >> server node >> > >> > >> > On 12/28/16 5:19 PM, Aaron Knister wrote: >> >> mmlssnsd -X | grep 0A6403AA58641546 >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:41:38 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:41:38 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> Message-ID: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> after restart. still doesn't seem to be in use. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:35:32 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > >> mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:08:58 2016 > It is not caching however. I will restart gpfs to see if that makes > it start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: >> >> >> >> On 12/29/16 10:09 AM, Sven Oehme wrote: >>> i agree that is a very long name , given this is a nvme device it >>> should show up as /dev/nvmeXYZ >>> i suggest to report exactly that in nsddevices and retry. >>> i vaguely remember we have some fixed length device name limitation >>> , but i don't remember what the length is, so this would be my first >>> guess too that the long name is causing trouble. >> I will try that. I was attempting to not need to write a custom udev >> rule for those. Also to keep the names persistent. Rhel 7 has a >> default rule that makes a sym link in /dev/disk/by-id. >> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> >> ../../nvme0n1 >> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> >> ../../nvme1n1 >>> >>> >>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>> > wrote: >>> >>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>> straws here. >>> >>> That's a *really* long device path (and nested too), I wonder if >>> that's >>> causing issues. >>> >>> What does a "tspreparedisk -S" show on that node? >>> >>> Also, what does your nsddevices script look like? I'm wondering >>> if you >>> could have it give back "/dev/dm-XXX" paths instead of >>> "/dev/disk/by-id" >>> paths if that would help things here. >>> >>> -Aaron >>> >>> On 12/29/16 10:57 AM, Matt Weil wrote: >>> > >>> > >>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> >> >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >> dmm ces1.gsc.wustl.edu >>> server node >>> > >>> > >>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>> >> mmlssnsd -X | grep 0A6403AA58641546 >>> > >>> > _______________________________________________ >>> > gpfsug-discuss mailing list >>> > gpfsug-discuss at spectrumscale.org >>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> > >>> >>> -- >>> Aaron Knister >>> NASA Center for Climate Simulation (Code 606.2) >>> Goddard Space Flight Center >>> (301) 286-2776 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Dec 29 17:06:40 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 17:06:40 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: first good that the problem at least is solved, it would be great if you could open a PMR so this gets properly fixed, the daemon shouldn't segfault, but rather print a message that the device is too big. on the caching , it only gets used when you run out of pagepool or when you run out of full file objects . so what benchmark, test did you run to push data into LROC ? sven On Thu, Dec 29, 2016 at 5:41 PM Matt Weil wrote: > after restart. still doesn't seem to be in use. > > [root at ces1 ~]# mmdiag --lroc > > > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > > Statistics from: Thu Dec 29 10:35:32 2016 > > > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) > > > On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 > > It is not caching however. I will restart gpfs to see if that makes it > start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: > > i agree that is a very long name , given this is a nvme device it should > show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , but > i don't remember what the length is, so this would be my first guess too > that the long name is causing trouble. > > I will try that. I was attempting to not need to write a custom udev rule > for those. Also to keep the names persistent. Rhel 7 has a default rule > that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 > > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 <%28301%29%20286-2776> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 17:23:11 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 11:23:11 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> -k thanks all I see it using the lroc now. On 12/29/16 11:06 AM, Sven Oehme wrote: > first good that the problem at least is solved, it would be great if > you could open a PMR so this gets properly fixed, the daemon shouldn't > segfault, but rather print a message that the device is too big. > > on the caching , it only gets used when you run out of pagepool or > when you run out of full file objects . so what benchmark, test did > you run to push data into LROC ? > > sven > > > On Thu, Dec 29, 2016 at 5:41 PM Matt Weil > wrote: > > after restart. still doesn't seem to be in use. > >> [root at ces1 ~]# mmdiag --lroc >> >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:35:32 2016 >> >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > On 12/29/16 10:28 AM, Matt Weil wrote: >> >> wow that was it. >> >>> mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 >>> stubFile 1073741824 >>> Max capacity: 1526184 MB, currently in use: 0 MB >>> Statistics from: Thu Dec 29 10:08:58 2016 >> It is not caching however. I will restart gpfs to see if that >> makes it start working. >> >> On 12/29/16 10:18 AM, Matt Weil wrote: >>> >>> >>> >>> On 12/29/16 10:09 AM, Sven Oehme wrote: >>>> i agree that is a very long name , given this is a nvme device >>>> it should show up as /dev/nvmeXYZ >>>> i suggest to report exactly that in nsddevices and retry. >>>> i vaguely remember we have some fixed length device name >>>> limitation , but i don't remember what the length is, so this >>>> would be my first guess too that the long name is causing trouble. >>> I will try that. I was attempting to not need to write a custom >>> udev rule for those. Also to keep the names persistent. Rhel 7 >>> has a default rule that makes a sym link in /dev/disk/by-id. >>> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> -> ../../nvme0n1 >>> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 >>> -> ../../nvme1n1 >>>> >>>> >>>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>>> > wrote: >>>> >>>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>>> straws here. >>>> >>>> That's a *really* long device path (and nested too), I >>>> wonder if that's >>>> causing issues. >>>> >>>> What does a "tspreparedisk -S" show on that node? >>>> >>>> Also, what does your nsddevices script look like? I'm >>>> wondering if you >>>> could have it give back "/dev/dm-XXX" paths instead of >>>> "/dev/disk/by-id" >>>> paths if that would help things here. >>>> >>>> -Aaron >>>> >>>> On 12/29/16 10:57 AM, Matt Weil wrote: >>>> > >>>> > >>>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>>> >> >>>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>>> >> dmm ces1.gsc.wustl.edu >>>> server node >>>> > >>>> > >>>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>>> >> mmlssnsd -X | grep 0A6403AA58641546 >>>> > >>>> > _______________________________________________ >>>> > gpfsug-discuss mailing list >>>> > gpfsug-discuss at spectrumscale.org >>>> >>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> > >>>> >>>> -- >>>> Aaron Knister >>>> NASA Center for Climate Simulation (Code 606.2) >>>> Goddard Space Flight Center >>>> (301) 286-2776 >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Sat Dec 31 20:05:35 2016 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Sat, 31 Dec 2016 15:05:35 -0500 Subject: [gpfsug-discuss] Save the Date April 4-5 2017 Users Group Meeting at NERSC Message-ID: Hello all and happy new year (depending upon where you are right now :-) ). We'll have more details in 2017, but for now please save the date for a two-day users group meeting at NERSC in Berkeley, California. April 4-5, 2017 National Energy Research Scientific Computing Center (nersc.gov) Berkeley, California We look forward to offering our first two-day event in the US. Best, Kristy & Bob From ulmer at ulmer.org Thu Dec 1 02:45:48 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 21:45:48 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen > On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > > Options I?m considering: > > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > > Comments or other ideas welcome. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Dec 1 03:55:38 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 30 Nov 2016 22:55:38 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: Hello Stephen, There are three licensing models for Spectrum Scale | GPFS: Server FPO Client I think the thing you might be missing is the associated cost per function. Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: Stephen Ulmer To: gpfsug main discussion list Date: 11/30/2016 09:46 PM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org I don?t understand what FPO provides here that mirroring doesn?t: You can still use failure domains ? one for each node. Both still have redundancy for the data; you can lose a disk or a node. The data has to be re-striped in the event of a disk failure ? no matter what. Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. What am I missing? Liberty, -- Stephen On Nov 30, 2016, at 3:51 PM, Andrew Beattie wrote: Bob, If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Date: Thu, Dec 1, 2016 12:34 AM Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From zgiles at gmail.com Thu Dec 1 03:59:40 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 22:59:40 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke wrote: > I have once set up a small system with just a few SSDs in two NSD servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 04:03:52 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 30 Nov 2016 23:03:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3EAF5887-90C7-4C45-83DE-7D96F4EAC71E@ulmer.org> Message-ID: <7F68B673-EA06-4E99-BE51-B76C06FE416E@ulmer.org> The licensing model was my last point ? if the OP uses FPO just to create data resiliency they increase their cost (or curtail their access). I was really asking if there was a real, technical positive for using FPO in this example, as I could only come up with equivalences and negatives. -- Stephen > On Nov 30, 2016, at 10:55 PM, Ken Hill wrote: > > Hello Stephen, > > There are three licensing models for Spectrum Scale | GPFS: > > Server > FPO > Client > > I think the thing you might be missing is the associated cost per function. > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > From: Stephen Ulmer > To: gpfsug main discussion list > Date: 11/30/2016 09:46 PM > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > I don?t understand what FPO provides here that mirroring doesn?t: > You can still use failure domains ? one for each node. > Both still have redundancy for the data; you can lose a disk or a node. > The data has to be re-striped in the event of a disk failure ? no matter what. > > Also, the FPO license doesn?t allow for regular clients to access the data -- only server and FPO nodes. > > What am I missing? > > Liberty, > > -- > Stephen > > > > On Nov 30, 2016, at 3:51 PM, Andrew Beattie > wrote: > > Bob, > > If your not going to use integrated Raid controllers in the servers, then FPO would seem to be the most resilient scenario. > yes it has its own overheads, but with that many drives to manage, a JOBD architecture and manual restriping doesn't sound like fun > > If you are going down the path of integrated raid controllers then any form of distributed raid is probably the best scenario, Raid 6 obviously. > > How many Nodes are you planning on building? The more nodes the more value FPO is likely to bring as you can be more specific in how the data is written to the nodes. > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: "Oesterlin, Robert" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > > Cc: > Subject: [gpfsug-discuss] Strategies - servers with local SAS disks > Date: Thu, Dec 1, 2016 12:34 AM > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. > > > Options I?m considering: > > > - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) > > - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe > > - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically > > > Comments or other ideas welcome. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Dec 1 04:15:17 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 30 Nov 2016 23:15:17 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> Message-ID: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zgiles at gmail.com Thu Dec 1 04:27:27 2016 From: zgiles at gmail.com (Zachary Giles) Date: Wed, 30 Nov 2016 23:27:27 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton > of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet > available, but if one scours the interwebs they can find mention of > something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/at > tachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x- > gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog ( > https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a > similar situation. It's perhaps at a very high conceptually level similar > to Mestor. You erasure code your data across the nodes w/ the SAS disks and > then present those block devices to your NSD servers. I proved it could > work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then > replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > >> Just remember that replication protects against data availability, not >> integrity. GPFS still requires the underlying block device to return >> good data. >> >> If you're using it on plain disks (SAS or SSD), and the drive returns >> corrupt data, GPFS won't know any better and just deliver it to the >> client. Further, if you do a partial read followed by a write, both >> replicas could be destroyed. There's also no efficient way to force use >> of a second replica if you realize the first is bad, short of taking the >> first entirely offline. In that case while migrating data, there's no >> good way to prevent read-rewrite of other corrupt data on your drive >> that has the "good copy" while restriping off a faulty drive. >> >> Ideally RAID would have a goal of only returning data that passed the >> RAID algorithm, so shouldn't be corrupt, or made good by recreating from >> parity. However, as we all know RAID controllers are definitely prone to >> failures as well for many reasons, but at least a drive can go bad in >> various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) >> without (hopefully) silent corruption.. >> >> Just something to think about while considering replication .. >> >> >> >> On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > > wrote: >> >> I have once set up a small system with just a few SSDs in two NSD >> servers, >> providin a scratch file system in a computing cluster. >> No RAID, two replica. >> works, as long the admins do not do silly things (like rebooting >> servers >> in sequence without checking for disks being up in between). >> Going for RAIDs without GPFS replication protects you against single >> disk >> failures, but you're lost if just one of your NSD servers goes off. >> >> FPO makes sense only sense IMHO if your NSD servers are also >> processing >> the data (and then you need to control that somehow). >> >> Other ideas? what else can you do with GPFS and local disks than >> what you >> considered? I suppose nothing reasonable ... >> >> >> Mit freundlichen Gr??en / Kind regards >> >> >> Dr. Uwe Falke >> >> IT Specialist >> High Performance Computing Services / Integrated Technology Services / >> Data Center Services >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland >> Rathausstr. 7 >> 09111 Chemnitz >> Phone: +49 371 6978 2165 >> Mobile: +49 175 575 2877 >> E-Mail: uwefalke at de.ibm.com >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland Business & Technology Services GmbH / >> Gesch?ftsf?hrung: >> Frank Hammer, Thorsten Moehring >> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht >> Stuttgart, >> HRB 17122 >> >> >> >> >> From: "Oesterlin, Robert" > > >> To: gpfsug main discussion list >> > > >> Date: 11/30/2016 03:34 PM >> Subject: [gpfsug-discuss] Strategies - servers with local SAS >> disks >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> >> Looking for feedback/strategies in setting up several GPFS servers >> with >> local SAS. They would all be part of the same file system. The >> systems are >> all similar in configuration - 70 4TB drives. >> >> Options I?m considering: >> >> - Create RAID arrays of the disks on each server (worried about the >> RAID >> rebuild time when a drive fails with 4, 6, 8TB drives) >> - No RAID with 2 replicas, single drive per NSD. When a drive fails, >> recreate the NSD ? but then I need to fix up the data replication via >> restripe >> - FPO ? with multiple failure groups - letting the system manage >> replica >> placement and then have GPFS due the restripe on disk failure >> automatically >> >> Comments or other ideas welcome. >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> 507-269-0413 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> -- >> Zach Giles >> zgiles at gmail.com >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 12:47:43 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 12:47:43 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Dec 1 13:13:31 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 1 Dec 2016 08:13:31 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: > on behalf of Zachary Giles > > Reply-To: gpfsug main discussion list > > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/ ) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > >> wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > > Mobile: +49 175 575 2877 > > E-Mail: uwefalke at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > >> > To: gpfsug main discussion list > > >> > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > -- > Zach Giles > zgiles at gmail.com > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 1 13:20:46 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Dec 2016 13:20:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Yep, I should have added those requirements :-) 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. 2) Yes, it will be backed up 3) I expect it to grow over time 4) Data integrity requirement: high Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Stephen Ulmer Reply-To: gpfsug main discussion list Date: Thursday, December 1, 2016 at 7:13 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. Liberty, -- Stephen On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert > wrote: Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister > wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke >> wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" >> To: gpfsug main discussion list >> Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org > Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Thu Dec 1 18:22:36 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Thu, 1 Dec 2016 10:22:36 -0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> References: <7F7B446E-C440-4D28-AE17-455CD73204E2@nuance.com> Message-ID: Hi Bob, If you mean #4 with 2x data replication...then I would be very wary as the chance of data loss would be very high given local disk failure rates. So I think its really #4 with 3x replication vs #3 with 2x replication (and raid5/6 in node) (with maybe 3x for metadata). The space overhead is somewhat similar, but the rebuild times should be much faster for #3 given that a failed disk will not place any load on the storage network (as well there will be less data placed on network). Dean From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/01/2016 04:48 AM Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. Option (4) seems the best of the ?no great options? I have in front of me. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Zachary Giles Reply-To: gpfsug main discussion list Date: Wednesday, November 30, 2016 at 10:27 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. There's very very limited information here: - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) Sounds like if it were available it would fit this use case very well. I also had preliminary success with using sheepdog ( https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. -Aaron On 11/30/16 10:59 PM, Zachary Giles wrote: Just remember that replication protects against data availability, not integrity. GPFS still requires the underlying block device to return good data. If you're using it on plain disks (SAS or SSD), and the drive returns corrupt data, GPFS won't know any better and just deliver it to the client. Further, if you do a partial read followed by a write, both replicas could be destroyed. There's also no efficient way to force use of a second replica if you realize the first is bad, short of taking the first entirely offline. In that case while migrating data, there's no good way to prevent read-rewrite of other corrupt data on your drive that has the "good copy" while restriping off a faulty drive. Ideally RAID would have a goal of only returning data that passed the RAID algorithm, so shouldn't be corrupt, or made good by recreating from parity. However, as we all know RAID controllers are definitely prone to failures as well for many reasons, but at least a drive can go bad in various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) without (hopefully) silent corruption.. Just something to think about while considering replication .. On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: I have once set up a small system with just a few SSDs in two NSD servers, providin a scratch file system in a computing cluster. No RAID, two replica. works, as long the admins do not do silly things (like rebooting servers in sequence without checking for disks being up in between). Going for RAIDs without GPFS replication protects you against single disk failures, but you're lost if just one of your NSD servers goes off. FPO makes sense only sense IMHO if your NSD servers are also processing the data (and then you need to control that somehow). Other ideas? what else can you do with GPFS and local disks than what you considered? I suppose nothing reasonable ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 11/30/2016 03:34 PM Subject: [gpfsug-discuss] Strategies - servers with local SAS disks Sent by: gpfsug-discuss-bounces at spectrumscale.org Looking for feedback/strategies in setting up several GPFS servers with local SAS. They would all be part of the same file system. The systems are all similar in configuration - 70 4TB drives. Options I?m considering: - Create RAID arrays of the disks on each server (worried about the RAID rebuild time when a drive fails with 4, 6, 8TB drives) - No RAID with 2 replicas, single drive per NSD. When a drive fails, recreate the NSD ? but then I need to fix up the data replication via restripe - FPO ? with multiple failure groups - letting the system manage replica placement and then have GPFS due the restripe on disk failure automatically Comments or other ideas welcome. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From eric.wonderley at vt.edu Thu Dec 1 19:10:08 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 1 Dec 2016 14:10:08 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk Message-ID: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 1 20:28:36 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 1 Dec 2016 14:28:36 -0600 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: I always suspend the disk then use mmrestripefs -m to remove the data. Then delete the disk with mmdeldisk. ?m Migrates all critical data off of any suspended disk in this file system. Critical data is all data that would be lost if currently suspended disks were removed. Can do multiple that why and us the entire cluster to move data if you want. On 12/1/16 1:10 PM, J. Eric Wonderley wrote: I have a few misconfigured disk groups and I have a few same size correctly configured disk groups. Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) process. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Dec 1 23:50:16 2016 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 01 Dec 2016 18:50:16 -0500 Subject: [gpfsug-discuss] Upgrading kernel on RHEL In-Reply-To: Your message of "Tue, 29 Nov 2016 20:56:25 +0000." References: <904EEBB5-E1DD-4606-993F-7E91ADA1FC37@cfms.org.uk>, Message-ID: <2253-1480636216.904015@Srjh.LZ4V.h1Mt> In the message dated: Tue, 29 Nov 2016 20:56:25 +0000, The pithy ruminations from Luis Bolinches on were: => Its been around in certain cases, some kernel <-> storage combination get => hit some not => => Scott referenced it here https://www.ibm.com/developerworks/community/wikis => /home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/ => Storage+with+GPFS+on+Linux => => https://access.redhat.com/solutions/2437991 => => It happens also on 7.2 and 7.3 ppc64 (not yet on the list of "supported") => it does not on 7.1. I can confirm this at least for XIV storage, that it => can go up to 1024 only. => => I know the FAQ will get updated about this, at least there is a CMVC that => states so. => => Long short, you create a FS, and you see all your paths die and recover and => die and receover and ..., one after another. And it never really gets done. => Also if you boot from SAN ... well you can figure it out ;) Wow, that sounds extremely similar to a kernel bug/incompatibility with GPFS that I reported in May: https://patchwork.kernel.org/patch/9140337/ https://bugs.centos.org/view.php?id=10997 My conclusion is not to apply kernel updates, unless strictly necessary (Dirty COW, anyone) or tested & validated with GPFS. Mark => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: Nathan Harper > Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug main discussion list => Cc: => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 10:44 PM => => This is the first I've heard of this max_sectors_kb issue, has it => already been discussed on the list? Can you point me to any more info? => => => => On 29 Nov 2016, at 19:08, Luis Bolinches => wrote: => => => Seen that one on 6.8 too => => teh 4096 does NOT work if storage is XIV then is 1024 => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland => Phone: +358 503112585 => => "If you continually give you will continually have." Anonymous => => => => ----- Original message ----- => From: "Kevin D Johnson" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 8:48 PM => => I have run into the max_sectors_kb issue and creating a file => system when moving beyond 3.10.0-327 on RH 7.2 as well. You => either have to reinstall the OS or walk the kernel back to 327 => via: => => https://access.redhat.com/solutions/186763 => => Kevin D. Johnson, MBA, MAFM => Spectrum Computing, Senior Managing Consultant => => IBM Certified Deployment Professional - Spectrum Scale V4.1.1 => IBM Certified Deployment Professional - Cloud Object Storage => V3.8 => IBM Certified Solution Advisor - Spectrum Computing V1 => => 720.349.6199 - kevindjo at us.ibm.com => => => => => ----- Original message ----- => From: "Luis Bolinches" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: gpfsug-discuss at spectrumscale.org => Cc: gpfsug-discuss at spectrumscale.org => Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 5:20 AM => => My 2 cents => => And I am sure different people have different opinions. => => New kernels might be problematic. => => Now got my fun with RHEL 7.3 kernel and max_sectors_kb for => new FS. Is something will come to the FAQ soon. It is => already on draft not public. => => I guess whatever you do .... get a TEST cluster and do it => there first, that is better the best advice I could give. => => => -- => Yst?v?llisin terveisin / Kind regards / Saludos cordiales / => Salutations => => Luis Bolinches => Lab Services => http://www-03.ibm.com/systems/services/labservices/ => => IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 => Finland => Phone: +358 503112585 => => "If you continually give you will continually have." => Anonymous => => => => ----- Original message ----- => From: "Sobey, Richard A" => Sent by: gpfsug-discuss-bounces at spectrumscale.org => To: "'gpfsug-discuss at spectrumscale.org'" < => gpfsug-discuss at spectrumscale.org> => Cc: => Subject: [gpfsug-discuss] Upgrading kernel on RHEL => Date: Tue, Nov 29, 2016 11:59 AM => => => All, => => => => As a general rule, when updating GPFS to a newer => release, would you perform a full OS update at the same => time, and/or update the kernel too? => => => => Just trying to gauge what other people do in this => respect. Personally I?ve always upgraded everything at => once ? including kernel. Am I looking for trouble? => => => => Cheers => => Richard => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise => above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => => => => Ellei edell? ole toisin mainittu: / Unless stated otherwise above: => Oy IBM Finland Ab => PL 265, 00101 Helsinki, Finland => Business ID, Y-tunnus: 0195876-3 => Registered in Finland => => _______________________________________________ => gpfsug-discuss mailing list => gpfsug-discuss at spectrumscale.org => http://gpfsug.org/mailman/listinfo/gpfsug-discuss => From Robert.Oesterlin at nuance.com Fri Dec 2 13:31:26 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 2 Dec 2016 13:31:26 +0000 Subject: [gpfsug-discuss] Follow-up: Storage Rich Server options Message-ID: <13B8F551-BCA2-4690-B45A-736BA549D2FC@nuance.com> Some follow-up to the discussion I kicked off a few days ago. Using simple GPFS replication on two sites looked like a good option, until you consider it?s really RAID5, and if the replica copy of the data fails during the restripe, you lose data. It?s not as bad as RAID5 because the data blocks for a file are spread across multiple servers versus reconstruction of a single array. Raid 6 + Metadata replication isn?t a bad option but you are vulnerable to server failure. It?s relatively low expansion factor makes it attractive. My personal recommendation is going to be use Raid 6 + Metadata Replication (use ?unmountOnDiskFail=meta? option), keep a spare server around to minimize downtime if one fails. Array rebuild times will impact performance, but it?s the price of having integrity. Comments? Data Distribution Expansion Factor Data Availability (Disk Failure) Data Availability (Server Failure) Data Integrity Comments Raid 6 (6+2) + Metadata replication 1.25+ High Low High Single server or single LUN failure results in some data being unavailable. Single Drive failure - lower performance during array rebuild. 2 site replication (GPFS) 2 High High Low Similar to RAID 5 - vulnerable to multiple disk failures. Rebuild done via GPFS restripe. URE vulnerable during restripe, but data distribution may mitigate this. Raid 6 (6+2) + Full 2 site replication (GPFS) 2.5 High High High Protected against single server and double drive failures. Single Drive failure - lower performance during array rebuild. Full 3 site replication (GPFS) 3 High High High Similar to RAID 6. Protected against single server and double drive failures. Rebuild done via GPFS restripe. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 15:03:59 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 10:03:59 -0500 Subject: [gpfsug-discuss] rpldisk vs deldisk & adddisk In-Reply-To: References: Message-ID: Ah...rpldisk is used to fix a single problem and typically you don't want to take a long trip thru md for just one small problem. Likely why it is seldom if ever used. On Thu, Dec 1, 2016 at 3:28 PM, Matt Weil wrote: > I always suspend the disk then use mmrestripefs -m to remove the data. > Then delete the disk with mmdeldisk. > > ?m > Migrates all critical data off of any suspended > disk in this file system. Critical data is all > data that would be lost if currently suspended > disks were removed. > Can do multiple that why and us the entire cluster to move data if you > want. > > On 12/1/16 1:10 PM, J. Eric Wonderley wrote: > > I have a few misconfigured disk groups and I have a few same size > correctly configured disk groups. > > Is there any (dis)advantage to running mmrpldisk over mmdeldisk and > mmadddisk? Everytime I have ever run mmdeldisk...it been somewhat > painful(even with qos) process. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Dec 2 20:51:14 2016 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 2 Dec 2016 15:51:14 -0500 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: Hi Michael: I was about to ask a similar question about nested filesets. I have this setup: [root at cl001 ~]# mmlsfileset home Filesets in file system 'home': Name Status Path root Linked /gpfs/home group Linked /gpfs/home/group predictHPC Linked /gpfs/home/group/predictHPC and I see this: [root at cl001 ~]# mmlsfileset home -L -d Collecting fileset usage information ... Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 I would have though that usage in fileset predictHPC would also go against the group fileset On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday < michael.holliday at crick.ac.uk> wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. > All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files > in the root fileset, or if it would include the sub filesets aswell. eg > If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Heiner.Billich at psi.ch Mon Dec 5 10:26:47 2016 From: Heiner.Billich at psi.ch (Heiner Billich) Date: Mon, 5 Dec 2016 11:26:47 +0100 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: Hello, I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... I really would prefer some ready-to-use script or program. Thank you and kind regards, Heiner Billich From peserocka at gmail.com Mon Dec 5 11:25:38 2016 From: peserocka at gmail.com (P Serocka) Date: Mon, 5 Dec 2016 19:25:38 +0800 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> References: <944ED1A3-048B-41B9-BEF0-78FD88859E2E@nuance.com> Message-ID: <6911BC0E-89DE-4C42-A46C-5DADB31E415A@gmail.com> It would be helpful to make a strict priority list of points like these: - use existing hw at no additional cost (kind of the starting point of this project) - data integrity requirement: high as you wrote - Performance (r/w/random): assumed low? - Flexibility of file tree layout: low? because: static content, "just" growing In case I got the priorities in the right order by pure chance, having ZFS as part of the solution would come to my mind (first two points). Then, with performance and flexibility on the lower ranks, I might consider... not to... deploy.... GPFS at all, but stick with with 12 separate archive servers. You actual priority list might be different. I was trying to illustrate how a strict ranking, and not cheating on yourself, simplifies drawing conclusions in a top-down approach. hth -- Peter On 2016 Dec 1. md, at 21:20 st, Oesterlin, Robert wrote: > Yep, I should have added those requirements :-) > > 1) Yes I care about the data. It?s not scratch but a permanent repository of older, less frequently accessed data. > 2) Yes, it will be backed up > 3) I expect it to grow over time > 4) Data integrity requirement: high > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > From: on behalf of Stephen Ulmer > Reply-To: gpfsug main discussion list > Date: Thursday, December 1, 2016 at 7:13 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Just because I don?t think I?ve seen you state it: (How much) Do you care about the data? > > Is it scratch? Is it test data that exists elsewhere? Does it ever flow from this storage to any other storage? Will it be dubbed business critical two years after they swear to you that it?s not important at all? Is it just your movie collection? Are you going to back it up? Is it going to grow? Is this temporary? > > That would inform us about the level of integrity required, which is one of the main differentiators for the options you?re considering. > > Liberty, > > -- > Stephen > > > > On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert wrote: > > Some interesting discussion here. Perhaps I should have been a bit clearer on what I?m looking at here: > > I have 12 servers with 70*4TB drives each ? so the hardware is free. What?s the best strategy for using these as GPFS NSD servers, given that I don?t want to relay on any ?bleeding edge? technologies. > > 1) My first choice would be GNR on commodity hardware ? if IBM would give that to us. :-) > 2) Use standard RAID groups with no replication ? downside is data availability of you lose an NSD and RAID group rebuild time with large disks > 3) RAID groups with replication ? but I lose a LOT of space (20% for RAID + 50% of what?s left for replication) > 4) No raid groups, single NSD per disk, single failure group per servers, replication. Downside here is I need to restripe every time a disk fails to get the filesystem back to a good state. Might be OK using QoS to get the IO impact down > 5) FPO doesn?t seem to by me anything, as these are straight NSD servers and no computation is going on these servers, and I still must live with the re-stripe. > > Option (4) seems the best of the ?no great options? I have in front of me. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > From: on behalf of Zachary Giles > Reply-To: gpfsug main discussion list > Date: Wednesday, November 30, 2016 at 10:27 PM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS disks > > Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic. > > It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. > > On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet available, but if one scours the interwebs they can find mention of something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog (https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a similar situation. It's perhaps at a very high conceptually level similar to Mestor. You erasure code your data across the nodes w/ the SAS disks and then present those block devices to your NSD servers. I proved it could work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > Just remember that replication protects against data availability, not > integrity. GPFS still requires the underlying block device to return > good data. > > If you're using it on plain disks (SAS or SSD), and the drive returns > corrupt data, GPFS won't know any better and just deliver it to the > client. Further, if you do a partial read followed by a write, both > replicas could be destroyed. There's also no efficient way to force use > of a second replica if you realize the first is bad, short of taking the > first entirely offline. In that case while migrating data, there's no > good way to prevent read-rewrite of other corrupt data on your drive > that has the "good copy" while restriping off a faulty drive. > > Ideally RAID would have a goal of only returning data that passed the > RAID algorithm, so shouldn't be corrupt, or made good by recreating from > parity. However, as we all know RAID controllers are definitely prone to > failures as well for many reasons, but at least a drive can go bad in > various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) > without (hopefully) silent corruption.. > > Just something to think about while considering replication .. > > > > On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke > wrote: > > I have once set up a small system with just a few SSDs in two NSD > servers, > providin a scratch file system in a computing cluster. > No RAID, two replica. > works, as long the admins do not do silly things (like rebooting servers > in sequence without checking for disks being up in between). > Going for RAIDs without GPFS replication protects you against single > disk > failures, but you're lost if just one of your NSD servers goes off. > > FPO makes sense only sense IMHO if your NSD servers are also processing > the data (and then you need to control that somehow). > > Other ideas? what else can you do with GPFS and local disks than > what you > considered? I suppose nothing reasonable ... > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Frank Hammer, Thorsten Moehring > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > > To: gpfsug main discussion list > > > Date: 11/30/2016 03:34 PM > Subject: [gpfsug-discuss] Strategies - servers with local SAS > disks > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Looking for feedback/strategies in setting up several GPFS servers with > local SAS. They would all be part of the same file system. The > systems are > all similar in configuration - 70 4TB drives. > > Options I?m considering: > > - Create RAID arrays of the disks on each server (worried about the RAID > rebuild time when a drive fails with 4, 6, 8TB drives) > - No RAID with 2 replicas, single drive per NSD. When a drive fails, > recreate the NSD ? but then I need to fix up the data replication via > restripe > - FPO ? with multiple failure groups - letting the system manage > replica > placement and then have GPFS due the restripe on disk failure > automatically > > Comments or other ideas welcome. > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > Zach Giles > zgiles at gmail.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Zach Giles > zgiles at gmail.com > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mimarsh2 at vt.edu Mon Dec 5 14:09:56 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 09:09:56 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: All, I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. Best, Brian Marshall On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or > 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by > asking google. Can please somebody point me to the source? I wonder whether > it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about > 100MB/s. I know about the possible workarounds - write a wrapper script, > run several rsyncs in parallel, distribute the rsync jobs on several nodes, > use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, > which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander.kuusemets at ut.ee Mon Dec 5 14:26:21 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Mon, 5 Dec 2016 16:26:21 +0200 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster Message-ID: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: > The CES shared root directory cannot be changed when the cluster is up > and running. If you want to modify the shared root configuration, you > must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 14:34:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 14:34:27 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets : > Hello, > > I have been thinking about setting up a CES cluster on my GPFS custer for > easier data distribution. The cluster is quite an old one - since 3.4, but > we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, > Infiniband interconnected. > The problem is this little line in Spectrum Scale documentation: > > The CES shared root directory cannot be changed when the cluster is up and > running. If you want to modify the shared root configuration, you must > bring the entire cluster down. > > > Does this mean that even the first time I'm setting CES up, I have to pull > down the whole cluster? I would understand this level of service disruption > when I already had set the directory before and now I was changing it, but > on an initial setup it's quite an inconvenience. Maybe there's a less > painful way for this? > > Best regards, > > -- > Sander Kuusemets > University of Tartu, High Performance Computing > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Dec 5 15:51:14 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 5 Dec 2016 10:51:14 -0500 Subject: [gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale? In-Reply-To: References: Message-ID: <58AC01C5-3B4B-43C0-9F62-F5B38D90EC50@ulmer.org> This is not the answer to not writing it yourself: However, be aware that GNU xargs has the -P x option, which will try to keep x batches running. It?s a good way to optimize the number of threads for anything you?re multiprocessing in the shell. So you can build a list and have xargs fork x copies of rsync or cp at a time (with -n y items in each batch). Not waiting to start the next batch when one finishes can add up to lots of MB*s very quickly. This is not the answer to anything, and is probably a waste of your time: I started to comment that if GPFS did provide such a ?data path shortcut?, I think I?d want it to work between any two allocation areas ? even two independent filesets in the same file system. Then I started working though the possibilities for just doing that? and it devolved into the realization that we?ve got to copy the data most of the time (unless it?s in the same filesystem *and* the same storage pool ? and maybe even then depending on how the allocator works). Realizing that I decide that sometimes it just sucks to have data in the wrong (old) place. :) Maybe what we want is to be able to split an independent fileset (if it is 1:1 with a storage pool) from a filesystem and graft it onto another one ? that?s probably easier and it almost mirrors vgsplit, et al. I should go do actual work... Liberty, > On Dec 5, 2016, at 9:09 AM, Brian Marshall > wrote: > > All, > > I am in the same boat. I'd like to copy ~500 TB from one filesystem to another. Both are being served by the same NSD servers. > > We've done the multiple rsync script method in the past (and yes it's a bit of a pain). Would love to have an easier utility. > > Best, > Brian Marshall > > On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich > wrote: > Hello, > > I heard about some gpfs optimized bulk(?) copy command named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it allows incremental copies as rsync does. > > We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp .... > > I really would prefer some ready-to-use script or program. > > Thank you and kind regards, > Heiner Billich > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 5 16:01:33 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 5 Dec 2016 11:01:33 -0500 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 5 16:14:06 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 5 Dec 2016 16:14:06 +0000 Subject: [gpfsug-discuss] waiting for exclusive use of connection for sending msg Message-ID: Hi Brian This boils down to a network contention issue ? that you are maxing out the network resources and GPFS is waiting. Now- digging deeper into why, that?s a larger issue. I?m still struggling with this myself. It takes a lot of digging into network stats, utilization, dropped packets, etc. It could be at the server/client or elsewhere in the network. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 5, 2016 at 10:01 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] waiting for exclusive use of connection for sending msg Bob (and all), I see in this post that you were tracking down a problem I am currently seeing. Lots of waiters in deadlock with "waiting for exclusive use of connection for sending msg". Did you ever determine a fix / cause for that? I see your previous comments below. We are still on 4.2.0 https://www.ibm.com/developerworks/community/forums/html/topic?id=c25e31ad-a2ae-408e-84e5-90f412806463 Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Mon Dec 5 16:33:24 2016 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 5 Dec 2016 17:33:24 +0100 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe Message-ID: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Mon Dec 5 20:49:44 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 5 Dec 2016 20:49:44 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 21:31:55 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 16:31:55 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Hi Everyone, In the GPFS documentation (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) it has this to say about the duration of an upgrade from 3.5 to 4.1: > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists >because some GPFS 4.1 features become available on each node as soon as the node is upgraded, while >other features will not become available until you upgrade all participating nodes. Does anyone have a feel for what "a short time" means? I'm looking to upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the size of our system it might take several weeks to complete. Seeing this language concerns me that after some period of time something bad is going to happen, but I don't know what that period of time is. Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kevindjo at us.ibm.com Mon Dec 5 21:35:54 2016 From: kevindjo at us.ibm.com (Kevin D Johnson) Date: Mon, 5 Dec 2016 21:35:54 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 5 22:52:58 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 05 Dec 2016 22:52:58 +0000 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I read it as "do your best". I doubt there can be problems that shows up after 3 weeks, that wouldn't also be triggerable after 1 day. -jf man. 5. des. 2016 kl. 22.32 skrev Aaron Knister : > Hi Everyone, > > In the GPFS documentation > ( > http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm > ) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time > without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. > The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Dec 5 23:00:43 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 5 Dec 2016 18:00:43 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Jan-Frode! If you don't mind sharing, over what period of time did you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you have in your cluster? -Aaron On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > I read it as "do your best". I doubt there can be problems that shows up > after 3 weeks, that wouldn't also be triggerable after 1 day. > > > -jf > > man. 5. des. 2016 kl. 22.32 skrev Aaron Knister > >: > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Tue Dec 6 07:25:13 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Tue, 6 Dec 2016 09:25:13 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Hello Aaron, I thought I'd share my two cents, as I just went through the process. I thought I'd do the same, start upgrading from where I can and wait until machines come available. It took me around 5 weeks to complete the process, but the last two were because I was super careful. At first nothing happened, but at one point, a week into the upgrade cycle, when I tried to mess around (create, delete, test) a fileset, suddenly I got the weirdest of error messages while trying to delete a fileset for the third time from a client node - I sadly cannot exactly remember what it said, but I can describe what happened. After the error message, the current manager of our cluster fell into arbitrating state, it's metadata disks were put to down state, manager status was given to our other server node and it's log was spammed with a lot of error messages, something like this: > mmfsd: > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) > + 0)' failed. > Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 > in process 15113, link reg 0xFFFFFFFFFFFFFFFF. > Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= > (sizeof(Pad32) + 0)) in line 1411 of file > /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h > Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: > Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 > logAssertFailed + 0x2D6 at ??:0 > Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 > PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 > Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 > tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 > Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 > RcvWorker::RcvMain() + 0x107 at ??:0 > Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B > RcvWorker::thread(void*) + 0x5B at ??:0 > Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 > Thread::callBody(Thread*) + 0x46 at ??:0 > Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 > start_thread + 0xD1 at ??:0 > Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + > 0x6D at ??:0 After this I tried to put disks up again, which failed half-way through and did the same with the other server node (current master). So after this my cluster had effectively failed, because all the metadata disks were down and there was no path to the data disks. When I tried to put all the metadata disks up with one start command, then it worked on third try and the cluster got into working state again. Downtime about an hour. I created a PMR with this information and they said that it's a bug, but it's a tricky one so it's going to take a while, but during that it's not recommended to use any commands from this list: > Our apologies for the delayed response. Based on the debug data we > have and looking at the source code, we believe the assert is due to > incompatibility is arising from the feature level version for the > RPCs. In this case the culprit is the PIT "interesting inode" code. > > Several user commands employ PIT (Parallel Inode Traversal) code to > traverse each data block of every file: > >> >> mmdelfileset >> mmdelsnapshot >> mmdefragfs >> mmfileid >> mmrestripefs >> mmdeldisk >> mmrpldisk >> mmchdisk >> mmadddisk > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'node3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > server11 which is receiving it is trying to interpret the RPC beyond > the valid region (as its feature level 1502 supports PIT interesting > inodes). And apparently any of the fileset commands either, as I failed with those. After I finished the upgrade, everything has been working wonderfully. But during this upgrade time I'd recommend to tread really carefully. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/05/2016 11:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing > this language concerns me that after some period of time something bad > is going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > From janfrode at tanso.net Tue Dec 6 08:04:04 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 6 Dec 2016 09:04:04 +0100 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Currently I'm with IBM Lab Services, and only have small test clusters myself. I'm not sure I've done v3.5->4.1 upgrades, but this warning about upgrading all nodes within a "short time" is something that's always been in the upgrade instructions, and I've been through many of these (I've been a gpfs sysadmin since 2002 :-) http://www.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs300.doc/bl1ins_migratl.htm https://www.scribd.com/document/51036833/GPFS-V3-4-Concepts-Planning-and-Installation-Guide BTW: One relevant issue I saw recently was a rolling upgrade from 4.1.0 to 4.1.1.7 where we had some nodes in the cluster running 4.1.0.0. Apparently there had been some CCR message format changes in a later release that made 4.1.0.0-nodes not being able to properly communicate with 4.1.1.4 -- even though they should be able to co-exist in the same cluster according to the upgrade instructions. So I guess the more versions you mix in a cluster, the more likely you're to hit a version mismatch bug. Best to feel a tiny bit uneasy about not running same version on all nodes, and hurry to get them all upgraded to the same level. And also, should you hit a bug during this process, the likely answer will be to upgrade everything to same level. -jf On Tue, Dec 6, 2016 at 12:00 AM, Aaron Knister wrote: > Thanks Jan-Frode! If you don't mind sharing, over what period of time did > you upgrade from 3.5 to 4.1 and roughly how many clients/servers do you > have in your cluster? > > -Aaron > > On 12/5/16 5:52 PM, Jan-Frode Myklebust wrote: > >> I read it as "do your best". I doubt there can be problems that shows up >> after 3 weeks, that wouldn't also be triggerable after 1 day. >> >> >> -jf >> >> man. 5. des. 2016 kl. 22.32 skrev Aaron Knister >> >: >> >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com >> .ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon >> as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing >> this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 6 08:17:37 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 6 Dec 2016 08:17:37 +0000 Subject: [gpfsug-discuss] CES services on an existing GPFS cluster In-Reply-To: References: <800ebde9-f912-5c41-2d04-092556a9e8d5@ut.ee>, Message-ID: I'm sure we changed this recently, I think all the CES nodes nerd to be down, but I don't think the whole cluster. We certainly set it for the first tine "live". Maybe I depends on the code version. Simi ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode Myklebust [janfrode at tanso.net] Sent: 05 December 2016 14:34 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CES services on an existing GPFS cluster No, the first time you define it, I'm pretty sure can be done online. But when changing it later, it will require the stopping the full cluster first. -jf man. 5. des. 2016 kl. 15.26 skrev Sander Kuusemets >: Hello, I have been thinking about setting up a CES cluster on my GPFS custer for easier data distribution. The cluster is quite an old one - since 3.4, but we have been doing rolling upgrades on it. 4.2.0 now, ~200 nodes Centos 7, Infiniband interconnected. The problem is this little line in Spectrum Scale documentation: The CES shared root directory cannot be changed when the cluster is up and running. If you want to modify the shared root configuration, you must bring the entire cluster down. Does this mean that even the first time I'm setting CES up, I have to pull down the whole cluster? I would understand this level of service disruption when I already had set the directory before and now I was changing it, but on an initial setup it's quite an inconvenience. Maybe there's a less painful way for this? Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From duersch at us.ibm.com Tue Dec 6 13:20:20 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 6 Dec 2016 08:20:20 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: You fit within the "short time". The purpose of this remark is to make it clear that this should not be a permanent stopping place. Getting all nodes up to the same version is safer and allows for the use of new features. Steve Duersch Spectrum Scale 845-433-7902 IBM Poughkeepsie, New York gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 5 Dec 2016 16:31:55 -0500 > From: Aaron Knister > To: gpfsug main discussion list > Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question > Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> > Content-Type: text/plain; charset="utf-8"; format=flowed > > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ > com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > > > Rolling upgrades allow you to install new GPFS code one node at a > time without shutting down GPFS > > on other nodes. However, you must upgrade all nodes within a short > time. The time dependency exists > >because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while > >other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 6 16:40:25 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 6 Dec 2016 10:40:25 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 6 17:36:11 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 06 Dec 2016 17:36:11 +0000 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil wrote: > Hello all, > > Thanks for sharing that. I am setting this up on our CES nodes. In this > example the nvme devices are not persistent. RHEL's default udev rules put > them in /dev/disk/by-id/ persistently by serial number so I modified > mmdevdiscover to look for them there. What are others doing? custom udev > rules for the nvme devices? > > Also I have used LVM in the past to stitch multiple nvme together for > better performance. I am wondering in the use case with GPFS that it may > hurt performance by hindering the ability of GPFS to do direct IO or > directly accessing memory. Any opinions there? > > Thanks > > Matt > On 12/5/16 10:33 AM, Ulf Troppens wrote: > > FYI ... in case not seen .... benchmark for LROC with NVMe > > http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf > > > -- > IBM Spectrum Scale Development - Client Engagements & Solutions Delivery > Consulting IT Specialist > Author "Storage Networks Explained" > > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > The materials in this message are private and may contain Protected > Healthcare Information or other information of a sensitive nature. If you > are not the intended recipient, be advised that any unauthorized use, > disclosure, copying or the taking of any action in reliance on the contents > of this information is strictly prohibited. If you have received this email > in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Wed Dec 7 03:47:00 2016 From: Valdis.Kletnieks at vt.edu (Valdis Kletnieks) Date: Tue, 06 Dec 2016 22:47:00 -0500 Subject: [gpfsug-discuss] ltfsee fsopt question... Message-ID: <114349.1481082420@turing-police.cc.vt.edu> Is it possible to use 'ltfsee fsopt' to set stub and preview sizes on a per-fileset basis, or is it fixed across an entire filesystem? From r.sobey at imperial.ac.uk Wed Dec 7 06:29:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 06:29:27 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 Message-ID: A word of wisdom: do not try and run CES on RHEL 7.3 :) Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn't intend to run 7.3 of course as I knew it wasn't supported. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Wed Dec 7 06:45:50 2016 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Wed, 7 Dec 2016 12:15:50 +0530 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org A word of wisdom: do not try and run CES on RHEL 7.3 J Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 7 09:13:23 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 7 Dec 2016 09:13:23 +0000 Subject: [gpfsug-discuss] CES ON RHEL7.3 In-Reply-To: References: Message-ID: I admit I didn?t do a whole lot of troubleshooting. We don?t run NFS so I can?t speak about that. Initially the server looked like it came back ok, albeit ?Node starting up..? was observed in the output of mmlscluster ?ces. At that time I was not sure if that was a) expected behaviour and/or b) related to GPFS 4.2.1-2. Once the node went back into service I had no complaints from customers that they faced any connectivity issues. The next morning I shut down a second CES node in order to upgrade it, but I observed that the first once went into a failed state (might have been a nasty coincidence!): [root at icgpfs-ces1 yum.repos.d]# mmces state show -a NODE AUTH AUTH_OBJ NETWORK NFS OBJ SMB CES icgpfs-ces1 FAILED DISABLED HEALTHY DISABLED DISABLED DEPEND STARTING icgpfs-ces2 DEPEND DISABLED SUSPENDED DEPEND DEPEND DEPEND DEPEND icgpfs-ces3 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY icgpfs-ces4 HEALTHY DISABLED HEALTHY DISABLED DISABLED HEALTHY HEALTHY (Where ICGPFS-CES1 was the node running 7.3). Also in mmces event show ?N icgpfs-ces1 ?time day the following error was logged about twice per minute: icgpfs-ces1 2016-12-06 06:32:04.968269 GMT wnbd_restart INFO WINBINDD process was not running. Trying to start it I moved the CES IP from icgpfs-ces2 to icgpfs-ces3 prior to suspending ?ces2. It was about that point I decided to abandon the planned upgrade of ?ces2, resume the node and then suspend ?ces1. Attempts to downgrade the Kernel/OS/redhat-release RPM back to 7.2 worked well, except when I tried to start CES again and the node reported ?Node failed?. I then rebuilt it completely, restored it to the cluster and it appears to be fine. Sorry I can?t be any more specific than that but I hope it helps. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ravi K Komanduri Sent: 07 December 2016 06:46 To: r.sobey at inperial.ac.uk Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] CES ON RHEL7.3 Sobey, Could you mention the problems that you have faced on CES env for RH 7.3. Is it related to the Kernel or in Ganesha environment ? Your thoughts/inputs would help us in fixing the same. Currently working on the CES environment on RH 7.3 support side. With Regards, Ravi K Komanduri GPFS team IBM From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2016 11:59 AM Subject: [gpfsug-discuss] CES ON RHEL7.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ A word of wisdom: do not try and run CES on RHEL 7.3 ?Although it appears to work, a few things break and it becomes a bit unpredictable as I found out the hard way. I didn?t intend to run 7.3 of course as I knew it wasn?t supported. Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Wed Dec 7 09:54:05 2016 From: peserocka at gmail.com (P Serocka) Date: Wed, 7 Dec 2016 17:54:05 +0800 Subject: [gpfsug-discuss] Quotas on Multiple Filesets In-Reply-To: References: Message-ID: <1FBA5DC2-DD14-4606-9B5A-A4373191B461@gmail.com> > > I would have though that usage in fileset predictHPC would also go against the group fileset quota-wise these filesets are "siblings", don't be fooled by the hierarchy formed by namespace linking. hth -- Peter On 2016 Dec 3. md, at 04:51 st, J. Eric Wonderley wrote: > Hi Michael: > > I was about to ask a similar question about nested filesets. > > I have this setup: > [root at cl001 ~]# mmlsfileset home > Filesets in file system 'home': > Name Status Path > root Linked /gpfs/home > group Linked /gpfs/home/group > predictHPC Linked /gpfs/home/group/predictHPC > > > and I see this: > [root at cl001 ~]# mmlsfileset home -L -d > Collecting fileset usage information ... > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Data (in KB) Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 134217728 123805696 63306355456 root fileset > group 1 67409030 0 Tue Nov 1 13:22:24 2016 0 0 0 0 > predictHPC 2 111318203 1 Fri Dec 2 14:05:56 2016 0 0 0 212206080 > > I would have though that usage in fileset predictHPC would also go against the group fileset > > On Tue, Nov 15, 2016 at 4:47 AM, Michael Holliday wrote: > Hey Everyone, > > > > I have a GPFS system which contain several groups of filesets. > > > > Each group has a root fileset, along with a number of other files sets. All of the filesets share the inode space with the root fileset. > > > > The file sets are linked to create a tree structure as shown: > > > > Fileset Root -> /root > > Fileset a -> /root/a > > Fileset B -> /root/b > > Fileset C -> /root/c > > > > > > I have applied a quota of 5TB to the root fileset. > > > > Could someone tell me if the quota will only take into account the files in the root fileset, or if it would include the sub filesets aswell. eg If have 3TB in A and 2TB in B - would that hit the 5TB quota on root? > > > > Thanks > > Michael > > > > > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From david_johnson at brown.edu Wed Dec 7 10:34:27 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Wed, 7 Dec 2016 05:34:27 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson From daniel.kidger at uk.ibm.com Wed Dec 7 12:36:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 12:36:56 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Wed Dec 7 14:24:38 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Wed, 07 Dec 2016 14:24:38 +0000 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: > IBM says it should work ok, we are not so sure. We had node expels that > stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Dec 7 14:37:15 2016 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 7 Dec 2016 09:37:15 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Dec 7 14:47:46 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 7 Dec 2016 09:47:46 -0500 Subject: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? In-Reply-To: References: <8A463084-1273-42BE-A5C1-1CE524DB9EC3@brown.edu> Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38@brown.edu> Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 7 14:58:39 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 7 Dec 2016 14:58:39 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com> <3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org" > Cc: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 7 15:59:50 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 7 Dec 2016 09:59:50 -0600 Subject: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe In-Reply-To: References: Message-ID: <05e77cc6-e3f6-7b06-521c-1d30606e02e0@wustl.edu> On 12/6/16 11:36 AM, Sven Oehme wrote: i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? yes /dev/sdX may change after a reboot especially if you add devices. using udev rules makes sure the device is always the same. if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an echo for the nvme device like : echo nvme0n1 generic this will tell the daemon to include that device on top of all other discovered devices that we include by default (like dm-* , sd*, etc) the longer term answer is that we have a tracking item to ad nvme* to the automatically discovered devices. yes that is what I meant by modifying mmdevdiscover on your second question, given that GPFS does workload balancing across devices you don't want to add extra complexity and path length to anything , so stick with raw devices . K that is what I was thinking. sven On Tue, Dec 6, 2016 at 8:40 AM Matt Weil > wrote: Hello all, Thanks for sharing that. I am setting this up on our CES nodes. In this example the nvme devices are not persistent. RHEL's default udev rules put them in /dev/disk/by-id/ persistently by serial number so I modified mmdevdiscover to look for them there. What are others doing? custom udev rules for the nvme devices? Also I have used LVM in the past to stitch multiple nvme together for better performance. I am wondering in the use case with GPFS that it may hurt performance by hindering the ability of GPFS to do direct IO or directly accessing memory. Any opinions there? Thanks Matt On 12/5/16 10:33 AM, Ulf Troppens wrote: FYI ... in case not seen .... benchmark for LROC with NVMe http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-gains-ibm-spectrum-scale.pdf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Dec 7 16:00:46 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 7 Dec 2016 16:00:46 +0000 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: , <528C481B-632B-4ED9-BA4A-8595FC069DAB@nuance.com><3328afec-79b5-e044-617f-28e1ded5ca2c@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 7 16:31:23 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 11:31:23 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> Message-ID: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Thanks Sander. That's disconcerting...yikes! Sorry for your trouble but thank you for sharing. I'm surprised this didn't shake out during testing of gpfs 3.5 and 4.1. I wonder if in light of this it's wise to do the clients first? My logic being that there's clearly an example here of 4.1 servers expecting behavior that only 4.1 clients provide. I suppose, though, that there's just as likely a chance that there could be a yet to be discovered bug in a situation where a 4.1 client expects something not provided by a 3.5 server. Our current plan is still to take servers first but I suspect we'll do a fair bit of testing with the PIT commands in our test environment just out of curiosity. Also out of curiosity, how long ago did you open that PMR? I'm wondering if there's a chance they've fixed this issue. I'm also perplexed and cocnerned that there's no documentation of the PIT commands to avoid during upgrades that I can find in any of the GPFS upgrade documentation. -Aaron On 12/6/16 2:25 AM, Sander Kuusemets wrote: > Hello Aaron, > > I thought I'd share my two cents, as I just went through the process. I > thought I'd do the same, start upgrading from where I can and wait until > machines come available. It took me around 5 weeks to complete the > process, but the last two were because I was super careful. > > At first nothing happened, but at one point, a week into the upgrade > cycle, when I tried to mess around (create, delete, test) a fileset, > suddenly I got the weirdest of error messages while trying to delete a > fileset for the third time from a client node - I sadly cannot exactly > remember what it said, but I can describe what happened. > > After the error message, the current manager of our cluster fell into > arbitrating state, it's metadata disks were put to down state, manager > status was given to our other server node and it's log was spammed with > a lot of error messages, something like this: > >> mmfsd: >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >> + 0)' failed. >> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >> (sizeof(Pad32) + 0)) in line 1411 of file >> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >> logAssertFailed + 0x2D6 at ??:0 >> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >> RcvWorker::RcvMain() + 0x107 at ??:0 >> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >> RcvWorker::thread(void*) + 0x5B at ??:0 >> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >> Thread::callBody(Thread*) + 0x46 at ??:0 >> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >> start_thread + 0xD1 at ??:0 >> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >> 0x6D at ??:0 > After this I tried to put disks up again, which failed half-way through > and did the same with the other server node (current master). So after > this my cluster had effectively failed, because all the metadata disks > were down and there was no path to the data disks. When I tried to put > all the metadata disks up with one start command, then it worked on > third try and the cluster got into working state again. Downtime about > an hour. > > I created a PMR with this information and they said that it's a bug, but > it's a tricky one so it's going to take a while, but during that it's > not recommended to use any commands from this list: > >> Our apologies for the delayed response. Based on the debug data we >> have and looking at the source code, we believe the assert is due to >> incompatibility is arising from the feature level version for the >> RPCs. In this case the culprit is the PIT "interesting inode" code. >> >> Several user commands employ PIT (Parallel Inode Traversal) code to >> traverse each data block of every file: >> >>> >>> mmdelfileset >>> mmdelsnapshot >>> mmdefragfs >>> mmfileid >>> mmrestripefs >>> mmdeldisk >>> mmrpldisk >>> mmchdisk >>> mmadddisk >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'node3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> server11 which is receiving it is trying to interpret the RPC beyond >> the valid region (as its feature level 1502 supports PIT interesting >> inodes). > > And apparently any of the fileset commands either, as I failed with those. > > After I finished the upgrade, everything has been working wonderfully. > But during this upgrade time I'd recommend to tread really carefully. > > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From sander.kuusemets at ut.ee Wed Dec 7 16:56:52 2016 From: sander.kuusemets at ut.ee (Sander Kuusemets) Date: Wed, 7 Dec 2016 18:56:52 +0200 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: It might have been some kind of a bug only we got, but I thought I'd share, just in case. The email when they said they opened a ticket for this bug's fix was quite exactly a month ago, so I doubt they've fixed it, as they said it might take a while. I don't know if this is of any help, but a paragraph from the explanation: > The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine > PIT_HelperGetWorkMH(). There are two RPC structures used in this routine > - PitHelperWorkReport > - PitInodeListPacket > > The problematic one is the 'PitInodeListPacket' subrpc which is a part > of an "interesting inode" code change. Looking at the dumps its > evident that node 'stage3' which sent the RPC is not capable of > supporting interesting inode (max feature level is 1340) and node > tank1 which is receiving it is trying to interpret the RPC beyond the > valid region (as its feature level 1502 supports PIT interesting > inodes). This is resulting in the assert you see. As a short term > measure bringing all the nodes to the same feature level should make > the problem go away. But since we support backward compatibility, we > are opening an APAR to create a code fix. It's unfortunately going to > be a tricky fix, which is going to take a significant amount of time. > Therefore I don't expect the team will be able to provide an efix > anytime soon. We recommend you bring all nodes in all clusters up the > latest level 4.2.0.4 and run the "mmchconfig release=latest" and > "mmchfs -V full" commands that will ensure all daemon levels and fs > levels are at the necessary level that supports the 1502 RPC feature > level. Best regards, -- Sander Kuusemets University of Tartu, High Performance Computing, IT Specialist On 12/07/2016 06:31 PM, Aaron Knister wrote: > Thanks Sander. That's disconcerting...yikes! Sorry for your trouble > but thank you for sharing. > > I'm surprised this didn't shake out during testing of gpfs 3.5 and > 4.1. I wonder if in light of this it's wise to do the clients first? > My logic being that there's clearly an example here of 4.1 servers > expecting behavior that only 4.1 clients provide. I suppose, though, > that there's just as likely a chance that there could be a yet to be > discovered bug in a situation where a 4.1 client expects something not > provided by a 3.5 server. Our current plan is still to take servers > first but I suspect we'll do a fair bit of testing with the PIT > commands in our test environment just out of curiosity. > > Also out of curiosity, how long ago did you open that PMR? I'm > wondering if there's a chance they've fixed this issue. I'm also > perplexed and cocnerned that there's no documentation of the PIT > commands to avoid during upgrades that I can find in any of the GPFS > upgrade documentation. > > -Aaron > > On 12/6/16 2:25 AM, Sander Kuusemets wrote: >> Hello Aaron, >> >> I thought I'd share my two cents, as I just went through the process. I >> thought I'd do the same, start upgrading from where I can and wait until >> machines come available. It took me around 5 weeks to complete the >> process, but the last two were because I was super careful. >> >> At first nothing happened, but at one point, a week into the upgrade >> cycle, when I tried to mess around (create, delete, test) a fileset, >> suddenly I got the weirdest of error messages while trying to delete a >> fileset for the third time from a client node - I sadly cannot exactly >> remember what it said, but I can describe what happened. >> >> After the error message, the current manager of our cluster fell into >> arbitrating state, it's metadata disks were put to down state, manager >> status was given to our other server node and it's log was spammed with >> a lot of error messages, something like this: >> >>> mmfsd: >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h:1411: >>> >>> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >>> UInt32, const char*, const char*): Assertion `msgLen >= (sizeof(Pad32) >>> + 0)' failed. >>> Wed Nov 2 19:24:01.967 2016: [N] Signal 6 at location 0x7F9426EFF625 >>> in process 15113, link reg 0xFFFFFFFFFFFFFFFF. >>> Wed Nov 2 19:24:05.058 2016: [X] *** Assert exp(msgLen >= >>> (sizeof(Pad32) + 0)) in line 1411 of file >>> /project/sprelbmd0/build/rbmd0s004a/src/avs/fs/mmfs/ts/cfgmgr/pitrpc.h >>> Wed Nov 2 19:24:05.059 2016: [E] *** Traceback: >>> Wed Nov 2 19:24:05.060 2016: [E] 2:0x7F9428BAFBB6 >>> logAssertFailed + 0x2D6 at ??:0 >>> Wed Nov 2 19:24:05.061 2016: [E] 3:0x7F9428CBEF62 >>> PIT_GetWorkMH(RpcContext*, char*) + 0x6E2 at ??:0 >>> Wed Nov 2 19:24:05.062 2016: [E] 4:0x7F9428BCBF62 >>> tscHandleMsg(RpcContext*, MsgDataBuf*) + 0x512 at ??:0 >>> Wed Nov 2 19:24:05.063 2016: [E] 5:0x7F9428BE62A7 >>> RcvWorker::RcvMain() + 0x107 at ??:0 >>> Wed Nov 2 19:24:05.064 2016: [E] 6:0x7F9428BE644B >>> RcvWorker::thread(void*) + 0x5B at ??:0 >>> Wed Nov 2 19:24:05.065 2016: [E] 7:0x7F94286F6F36 >>> Thread::callBody(Thread*) + 0x46 at ??:0 >>> Wed Nov 2 19:24:05.066 2016: [E] 8:0x7F94286E5402 >>> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >>> Wed Nov 2 19:24:05.067 2016: [E] 9:0x7F9427E0E9D1 >>> start_thread + 0xD1 at ??:0 >>> Wed Nov 2 19:24:05.068 2016: [E] 10:0x7F9426FB58FD clone + >>> 0x6D at ??:0 >> After this I tried to put disks up again, which failed half-way through >> and did the same with the other server node (current master). So after >> this my cluster had effectively failed, because all the metadata disks >> were down and there was no path to the data disks. When I tried to put >> all the metadata disks up with one start command, then it worked on >> third try and the cluster got into working state again. Downtime about >> an hour. >> >> I created a PMR with this information and they said that it's a bug, but >> it's a tricky one so it's going to take a while, but during that it's >> not recommended to use any commands from this list: >> >>> Our apologies for the delayed response. Based on the debug data we >>> have and looking at the source code, we believe the assert is due to >>> incompatibility is arising from the feature level version for the >>> RPCs. In this case the culprit is the PIT "interesting inode" code. >>> >>> Several user commands employ PIT (Parallel Inode Traversal) code to >>> traverse each data block of every file: >>> >>>> >>>> mmdelfileset >>>> mmdelsnapshot >>>> mmdefragfs >>>> mmfileid >>>> mmrestripefs >>>> mmdeldisk >>>> mmrpldisk >>>> mmchdisk >>>> mmadddisk >>> The problematic one is the 'PitInodeListPacket' subrpc which is a part >>> of an "interesting inode" code change. Looking at the dumps its >>> evident that node 'node3' which sent the RPC is not capable of >>> supporting interesting inode (max feature level is 1340) and node >>> server11 which is receiving it is trying to interpret the RPC beyond >>> the valid region (as its feature level 1502 supports PIT interesting >>> inodes). >> >> And apparently any of the fileset commands either, as I failed with >> those. >> >> After I finished the upgrade, everything has been working wonderfully. >> But during this upgrade time I'd recommend to tread really carefully. >> >> Best regards, >> > From aaron.s.knister at nasa.gov Wed Dec 7 17:31:28 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 7 Dec 2016 12:31:28 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> <135ece3b-dfe5-162a-9043-cc10924c3d91@ut.ee> <8f1a100f-b67a-f752-f6cd-2c9b6047db32@nasa.gov> Message-ID: Thanks! I do have a question, though. Feature level 1340 I believe is equivalent to GPFS version 3.5.0.11. Feature level 1502 is GPFS 4.2 if I understand correctly. That suggests to me there are 3.5 and 4.2 nodes in the same cluster? Or at least 4.2 nodes in a cluster where the max feature level is 1340. I didn't think either of those are supported configurations? Am I missing something? -Aaron On 12/7/16 11:56 AM, Sander Kuusemets wrote: > It might have been some kind of a bug only we got, but I thought I'd > share, just in case. > > The email when they said they opened a ticket for this bug's fix was > quite exactly a month ago, so I doubt they've fixed it, as they said it > might take a while. > > I don't know if this is of any help, but a paragraph from the explanation: > >> The assert "msgLen >= (sizeof(Pad32) + 0)" is from routine >> PIT_HelperGetWorkMH(). There are two RPC structures used in this routine >> - PitHelperWorkReport >> - PitInodeListPacket >> >> The problematic one is the 'PitInodeListPacket' subrpc which is a part >> of an "interesting inode" code change. Looking at the dumps its >> evident that node 'stage3' which sent the RPC is not capable of >> supporting interesting inode (max feature level is 1340) and node >> tank1 which is receiving it is trying to interpret the RPC beyond the >> valid region (as its feature level 1502 supports PIT interesting >> inodes). This is resulting in the assert you see. As a short term >> measure bringing all the nodes to the same feature level should make >> the problem go away. But since we support backward compatibility, we >> are opening an APAR to create a code fix. It's unfortunately going to >> be a tricky fix, which is going to take a significant amount of time. >> Therefore I don't expect the team will be able to provide an efix >> anytime soon. We recommend you bring all nodes in all clusters up the >> latest level 4.2.0.4 and run the "mmchconfig release=latest" and >> "mmchfs -V full" commands that will ensure all daemon levels and fs >> levels are at the necessary level that supports the 1502 RPC feature >> level. > Best regards, > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From carlz at us.ibm.com Wed Dec 7 17:47:52 2016 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 7 Dec 2016 12:47:52 -0500 Subject: [gpfsug-discuss] Strategies - servers with local SAS disks In-Reply-To: References: Message-ID: We don't allow mixing of different licensing models (i.e. socket and capacity) within a single cluster*. As we worked through the implications, we realized it would be just too complicated to determine how to license any non-NSD nodes (management, CES, clients, etc.). In the socket model they are chargeable, in the capacity model they are not, and while we could have made up some rules, they would have added even more complexity to Scale licensing. This in turn is why we "grandfathered in" those customers already on Advanced Edition, so that they don't have to convert existing clusters to the new metric unless or until they want to. They can continue to buy Advanced Edition. The other thing we wanted to do with the capacity metric was to make the licensing more friendly to architectural best practices or design choices. So now you can have whatever management, gateway, etc. servers you need without paying for additional server licenses. In particular, client-only clusters cost nothing, and you don't have to keep track of clients if you have a virtual environment where clients come and go rapidly. I'm always happy to answer other questions about licensing. regards, Carl Zetie *OK, there is one exception involving future ESS models and existing clusters. If this is you, please have a conversation with your account team. Carl Zetie Program Director, OM for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 12/07/2016 09:59 AM Subject: gpfsug-discuss Digest, Vol 59, Issue 20 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (Felipe Knop) 2. Re: Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? (David D. Johnson) 3. Re: Strategies - servers with local SAS disks (Simon Thompson (Research Computing - IT Services)) ---------------------------------------------------------------------- Message: 1 Date: Wed, 7 Dec 2016 09:37:15 -0500 From: "Felipe Knop" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: Content-Type: text/plain; charset="us-ascii" All, The SMAP issue has been addressed in GPFS in 4.2.1.1. See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Q2.4. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Aaron Knister To: gpfsug main discussion list Date: 12/07/2016 09:25 AM Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Sent by: gpfsug-discuss-bounces at spectrumscale.org I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. -Aaron On Wed, Dec 7, 2016 at 5:34 AM wrote: IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? -- ddj Dave Johnson _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/48aa0319/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 7 Dec 2016 09:47:46 -0500 From: "David D. Johnson" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? Message-ID: <5FBAC3AE-39F2-453D-8A9D-5FDE90BADD38 at brown.edu> Content-Type: text/plain; charset="utf-8" Yes, we saw the SMAP issue on earlier releases, added the kernel command line option to disable it. That is not the issue for this node. The Phi processors do not support that cpu feature. ? ddj > On Dec 7, 2016, at 9:37 AM, Felipe Knop wrote: > > All, > > The SMAP issue has been addressed in GPFS in 4.2.1.1. > > See http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html < http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html> > > Q2.4. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > > From: Aaron Knister > To: gpfsug main discussion list > Date: 12/07/2016 09:25 AM > Subject: Re: [gpfsug-discuss] Any experience running native GPFS 4.2.1 on Xeon Phi node booted with Centos 7.3? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I don't know if this applies her but I seem to recall an issue with CentOS 7 (newer 3.X and on kernels), Broadwell processors and GPFS where GPFS upset SMAP and would eventually get the node expelled. I think this may be fixed in newer GPFS releases but the fix is to boot the kernel with the nosmap parameter. Might be worth a try. I'm not clear on whether SMAP is supported by the Xeon Phi's. > > -Aaron > > On Wed, Dec 7, 2016 at 5:34 AM > wrote: > IBM says it should work ok, we are not so sure. We had node expels that stopped when we turned off gpfs on that node. Has anyone had better luck? > > -- ddj > Dave Johnson > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss < http://gpfsug.org/mailman/listinfo/gpfsug-discuss> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/92819f21/attachment-0001.html > ------------------------------ Message: 3 Date: Wed, 7 Dec 2016 14:58:39 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks Message-ID: Content-Type: text/plain; charset="us-ascii" I was going to ask about this, I recall it being mentioned about "grandfathering" and also having mixed deployments. Would that mean you could per TB license one set of NSD servers (hosting only 1 FS) that co-existed in a cluster with other traditionally licensed systems? I would see having NSDs with different license models hosting the same FS being problematic, but if it were a different file-system? Simon From: > on behalf of Daniel Kidger > Reply-To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Date: Wednesday, 7 December 2016 at 12:36 To: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Cc: "gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>" > Subject: Re: [gpfsug-discuss] Strategies - servers with local SAS disks The new volume based licensing option is I agree quite pricey per TB at first sight, but it could make some configuration choice, a lot cheaper than they used to be under the Client:FPO:Server model. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161207/51c1a2ea/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 20 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Dec 8 13:33:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 8 Dec 2016 13:33:40 +0000 Subject: [gpfsug-discuss] Flash Storage wiki entry incorrect Message-ID: To whom it may concern, I've just set up an LROC disk in one of my CES nodes and going from the example in: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage I used the following as a guide: cat lroc-stanza.txt %nsd: nsd=lroc-nsd1 device=/dev/faio server=gpfs-client1 <-- is not a NSD server, but client with Fusion i/o or SSD install as target for LROC usage=localCache The only problems are that 1) hyphens aren't allowed in NSD names and 2) the server parameter should be servers (plural). Once I worked that out I was good to go but perhaps someone could update the page with a (working) example? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Dec 8 19:27:08 2016 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 8 Dec 2016 14:27:08 -0500 Subject: [gpfsug-discuss] GPFS fails to use VERBS RDMA because link is not up yet Message-ID: Under RHEL/CentOS 6, I had hacked an ?ibready? script for the SysV style init system that waits for link to come up on the infiniband port before allowing GPFS to start. Now that we?re moving to CentOS/RHEL 7.2, I need to reimplement this workaround for the fact that GPFS only tries once to start VERBS RDMA, and gives up if there is no link. I think it can be done by making a systemd unit that asks to run Before gpfs. Wondering if anyone has already done this to avoid reinventing the wheel?. Thanks, ? ddj Dave Johnson Brown University From r.sobey at imperial.ac.uk Fri Dec 9 11:52:12 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 11:52:12 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access Message-ID: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Fri Dec 9 13:21:12 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 9 Dec 2016 08:21:12 -0500 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: Message-ID: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone > On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: > > Hi all, > > Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). > > Cheers > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtolson at us.ibm.com Fri Dec 9 14:32:45 2016 From: jtolson at us.ibm.com (John T Olson) Date: Fri, 9 Dec 2016 07:32:45 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From billowen at us.ibm.com Fri Dec 9 15:44:28 2016 From: billowen at us.ibm.com (Bill Owen) Date: Fri, 9 Dec 2016 08:44:28 -0700 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Hi John, Nice paper! Regarding object auditing: - Does Varonis have an API that could be used to tell it when object operations complete from normal object interface? If so, a middleware module could be used to send interesting events to Varonis (this is already done in openstack auditing using CADF) - With Varonis, can you monitor operations just on ".data" files? (these are the real objects) Can you also include file metadata values in the logging of these operations? If so, the object url could be pulled whenever a .data file is created, renamed (delete), or read Thanks, Bill Owen billowen at us.ibm.com Spectrum Scale Object Storage 520-799-4829 From: John T Olson/Tucson/IBM at IBMUS To: gpfsug main discussion list Date: 12/09/2016 07:33 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:14:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:14:14 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> References: , <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Thanks Aaron. I will take a look on Moday. Now I think about it, I did something like this on the old Samba/CTDB cluster before we deployed CES, so it must be possible, just to what level IBM will support it. Have a great weekend, Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister Sent: 09 December 2016 13:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Dec 9 20:15:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 9 Dec 2016 20:15:03 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com>, Message-ID: Thanks John, As I said to Aaron I will also take a look at this on Monday. Regards Richard ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John T Olson Sent: 09 December 2016 14:32 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister To: gpfsug main discussion list Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp [https://s0.wp.com/i/blank.jpg] Samba: Logging User Activity moiristo.wordpress.com Ever wondered why Samba seems to log so many things, except what you're interested in? So did I, and it took me a while to find out that 1) there actually is a solution and 2) how to configur... I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From aaron.s.knister at nasa.gov Sat Dec 10 03:53:06 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 9 Dec 2016 22:53:06 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: Message-ID: <38d056ad-833f-1582-58fd-0e65a52ded6c@nasa.gov> Thanks Steve, that was exactly the answer I was looking for. On 12/6/16 8:20 AM, Steve Duersch wrote: > You fit within the "short time". The purpose of this remark is to make > it clear that this should not be a permanent stopping place. > Getting all nodes up to the same version is safer and allows for the use > of new features. > > > Steve Duersch > Spectrum Scale > 845-433-7902 > IBM Poughkeepsie, New York > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 12/06/2016 02:25:18 AM: > > >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 5 Dec 2016 16:31:55 -0500 >> From: Aaron Knister >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question >> Message-ID: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269 at nasa.gov> >> Content-Type: text/plain; charset="utf-8"; format=flowed >> >> Hi Everyone, >> >> In the GPFS documentation >> (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/ >> com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) >> it has this to say about the duration of an upgrade from 3.5 to 4.1: >> >> > Rolling upgrades allow you to install new GPFS code one node at a >> time without shutting down GPFS >> > on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> >because some GPFS 4.1 features become available on each node as soon as >> the node is upgraded, while >> >other features will not become available until you upgrade all >> participating nodes. >> >> Does anyone have a feel for what "a short time" means? I'm looking to >> upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the >> size of our system it might take several weeks to complete. Seeing this >> language concerns me that after some period of time something bad is >> going to happen, but I don't know what that period of time is. >> >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any >> anecdotes they'd like to share, I would like to hear them. >> >> Thanks! >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> >> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 10 05:31:39 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 9 Dec 2016 21:31:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. -Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.knister at gmail.com Sat Dec 10 12:35:15 2016 From: aaron.knister at gmail.com (Aaron Knister) Date: Sat, 10 Dec 2016 07:35:15 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: Thanks Eric! I have a few follow up questions for you-- Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? I'm curious to know what version of 4.1 you were at when you ran the mmchconfig. Would you mind sharing any log messages related to the errors you saw when you ran the mmchconfig? Thanks! Sent from my iPhone > On Dec 10, 2016, at 12:31 AM, Eric Horst wrote: > > >> On Mon, Dec 5, 2016 at 1:31 PM, Aaron Knister wrote: >> Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any anecdotes they'd like to share, I would like to hear them. > > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end mmchconfig release=LATEST. Minutes after flipping to latest the cluster became non-responsive, with node mmfs panics and everything had to be restarted. Logs indicated it was a quota problem. In 4.1 the quota files move from externally visible files to internal hidden files. I suspect the quota file transition can't be done without a cluster restart. When I did the second cluster I upgraded all nodes and then very quickly stopped and started the entire cluster, issuing the mmchconfig in the middle. No quota panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end mmchconfig release=LATEST. No cluster restart. Everything seemed to work okay. Later, restarting a node I got weird fstab errors on gpfs startup and using certain commands, notably mmfind, the command would fail with something like "can't find /dev/uwfs" (our filesystem.) I restarted the whole cluster and everything began working normally. In this case 4.2 got rid of /dev/fsname. Just like in the quota case it seems that this transition can't be seamless. Doing the second cluster I upgraded all nodes and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very complex product with a very easy upgrade procedure. I could imagine many ways that an upgrade hop of two major versions in two weeks could go very wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Dec 11 15:07:09 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 11 Dec 2016 10:07:09 -0500 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: I thought I'd share this with folks. I saw some log asserts in our test environment (~1050 client nodes and 12 manager/server nodes). I'm going from 3.5.0.31 (well, 2 clients are still at 3.5.0.19) -> 4.1.1.10. I've been running filebench in a loop for the past several days. It's sustaining about 60k write iops and about 15k read iops to the metadata disks for the filesystem I'm testing with, so I'd say it's getting pushed reasonably hard. The test cluster had 4.1 clients before it had 4.1 servers but after flipping 420 clients from 3.5.0.31 to 4.1.1.10 and starting up filebench I'm now seeing periodic logasserts from the manager/server nodes: Dec 11 08:57:39 loremds12 mmfs: Generic error in /project/sprelfks2/build/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C line 304 retCode 0, reasonCode 0 Dec 11 08:57:39 loremds12 mmfs: mmfsd: Error=MMFS_GENERIC, ID=0x30D9195E, Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 08:57:39 loremds12 mmfs: [X] logAssertFailed: (!"downgrade to mode which is not StrictlyWeaker") Dec 11 08:57:39 loremds12 mmfs: [X] return code 0, reason code 0, log record tag 0 Dec 11 08:57:42 loremds12 mmfs: [E] 10:0xA1BD5B RcvWorker::thread(void*).A1BD00 + 0x5B at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 11:0x622126 Thread::callBody(Thread*).6220E0 + 0x46 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 12:0x61220F Thread::callBodyWrapper(Thread*).612180 + 0x8F at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 13:0x7FF4E6BE66B6 start_thread + 0xE6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 14:0x7FF4E5FEE06D clone + 0x6D at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 2:0x9F95E9 logAssertFailed.9F9440 + 0x1A9 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 3:0x1232836 TokenClass::fixClientMode(Token*, int, int, int, CopysetRevoke*).1232350 + 0x4E6 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 4:0x1235593 TokenClass::HandleTellRequest(RpcContext*, Request*, char**, int).1232AD0 + 0x2AC3 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 5:0x123A23C HandleTellRequestInterface(RpcContext*, Request*, char**, int).123A0D0 + 0x16C at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 6:0x125C6B0 queuedTellServer(RpcContext*, Request*, int, unsigned int).125C670 + 0x40 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 7:0x125EF72 tmHandleTellServer(RpcContext*, char*).125EEC0 + 0xB2 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 8:0xA12668 tscHandleMsg(RpcContext*, MsgDataBuf*).A120D0 + 0x598 at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] 9:0xA1BC4E RcvWorker::RcvMain().A1BB50 + 0xFE at ??:0 Dec 11 08:57:42 loremds12 mmfs: [E] *** Traceback: Dec 11 08:57:42 loremds12 mmfs: [N] Signal 6 at location 0x7FF4E5F456D5 in process 12188, link reg 0xFFFFFFFFFFFFFFFF. Dec 11 08:57:42 loremds12 mmfs: [X] *** Assert exp((!"downgrade to mode which is not StrictlyWeaker") node 584 old mode ro new mode (A: D: A) ) in line 304 of file /project/sprelfks2/bui ld/rfks2s010a/src/avs/fs/mmfs/ts/tm/HandleReq.C I've seen different messages on that third line of the "Tag=" message: Dec 11 00:16:40 loremds11 mmfs: Tag=5012168 node 825 old mode ro new mode 0x31 Dec 11 01:52:53 loremds10 mmfs: Tag=5016618 node 655 old mode ro new mode (A: MA D: ) Dec 11 02:15:57 loremds10 mmfs: Tag=5045549 node 994 old mode ro new mode (A: A D: A) Dec 11 08:14:22 loremds10 mmfs: Tag=5067054 node 237 old mode ro new mode 0x08 Dec 11 08:57:39 loremds12 mmfs: Tag=4908715 node 584 old mode ro new mode (A: D: A) Dec 11 00:47:39 loremds09 mmfs: Tag=4998635 node 461 old mode ro new mode (A:R D: ) It's interesting to note that all of these node indexes are still running 3.5. I'm going to open up a PMR but thought I'd share the gory details here and see if folks had any insight. I'm starting to wonder if 4.1 clients are more tolerant of 3.5 servers than 4.1 servers are of 3.5 clients. -Aaron On 12/5/16 4:31 PM, Aaron Knister wrote: > Hi Everyone, > > In the GPFS documentation > (http://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs300.doc/bl1ins_migratl.htm) > it has this to say about the duration of an upgrade from 3.5 to 4.1: > >> Rolling upgrades allow you to install new GPFS code one node at a time >> without shutting down GPFS >> on other nodes. However, you must upgrade all nodes within a short >> time. The time dependency exists >> because some GPFS 4.1 features become available on each node as soon as > the node is upgraded, while >> other features will not become available until you upgrade all > participating nodes. > > Does anyone have a feel for what "a short time" means? I'm looking to > upgrade from 3.5.0.31 to 4.1.1.10 in a rolling fashion but given the > size of our system it might take several weeks to complete. Seeing this > language concerns me that after some period of time something bad is > going to happen, but I don't know what that period of time is. > > Also, if anyone has done a rolling 3.5 to 4.1 upgrade and has any > anecdotes they'd like to share, I would like to hear them. > > Thanks! > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sun Dec 11 21:28:39 2016 From: erich at uw.edu (Eric Horst) Date: Sun, 11 Dec 2016 13:28:39 -0800 Subject: [gpfsug-discuss] GPFS 3.5 to 4.1 Upgrade Question In-Reply-To: References: <339e3bdb-b4cc-87af-a4a0-5b5b5a9f7269@nasa.gov> Message-ID: On Sat, Dec 10, 2016 at 4:35 AM, Aaron Knister wrote: > Thanks Eric! > > I have a few follow up questions for you-- > > Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? > I'm curious to know what version of 4.1 you were at when you ran the > mmchconfig. > I went from 3.5.0-28 to 4.1.0-8 to 4.2.1-1. > > Would you mind sharing any log messages related to the errors you saw when > you ran the mmchconfig? > > Unfortunately I didn't save any actual logs from the update. I did the first cluster in early July so nothing remains. The only note I have is: "On update, after finalizing gpfs 4.1 the quota file format apparently changed and caused a mmrepquota hang/deadlock. Had to shutdown and restart the whole cluster." Sorry to not be very helpful on that front. -Eric > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different > clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end > mmchconfig release=LATEST. Minutes after flipping to latest the cluster > became non-responsive, with node mmfs panics and everything had to be > restarted. Logs indicated it was a quota problem. In 4.1 the quota files > move from externally visible files to internal hidden files. I suspect the > quota file transition can't be done without a cluster restart. When I did > the second cluster I upgraded all nodes and then very quickly stopped and > started the entire cluster, issuing the mmchconfig in the middle. No quota > panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end > mmchconfig release=LATEST. No cluster restart. Everything seemed to work > okay. Later, restarting a node I got weird fstab errors on gpfs startup and > using certain commands, notably mmfind, the command would fail with > something like "can't find /dev/uwfs" (our filesystem.) I restarted the > whole cluster and everything began working normally. In this case 4.2 got > rid of /dev/fsname. Just like in the quota case it seems that this > transition can't be seamless. Doing the second cluster I upgraded all nodes > and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very > complex product with a very easy upgrade procedure. I could imagine many > ways that an upgrade hop of two major versions in two weeks could go very > wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 12 13:55:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 12 Dec 2016 13:55:52 +0000 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? Message-ID: Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it will work, but I?m not sure if IBM would support it. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Dec 13 04:05:08 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 12 Dec 2016 23:05:08 -0500 Subject: [gpfsug-discuss] Ceph RBD Volumes and GPFS? In-Reply-To: References: Message-ID: Hi Bob, I have not, although I started to go down that path. I had wanted erasure coded pools but in order to front an erasure coded pool with an RBD volume you apparently need a cache tier? Seems that doesn't give one the performance they might want for this type of workload (http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#a-word-of-caution). If you're OK replicating the data I suspect it might work well. I did try sheepdog (https://sheepdog.github.io/sheepdog/) and that did work the way I wanted it to with erasure coding and gave me pretty good performance to boot. -Aaron On 12/12/16 8:55 AM, Oesterlin, Robert wrote: > Has anyone tried using Ceph RBD volumes with GPFS? I?m guessing that it > will work, but I?m not sure if IBM would support it. > > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From r.sobey at imperial.ac.uk Thu Dec 15 13:13:43 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 15 Dec 2016 13:13:43 +0000 Subject: [gpfsug-discuss] Auditing of SMB file access In-Reply-To: References: <689AA574-500A-4C8F-B086-114648BDE1DB@gmail.com> Message-ID: Ah. I stopped reading when I read that the service account needs Domain Admin rights. I doubt that will fly unfortunately. Thanks though John. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John T Olson Sent: 09 December 2016 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Auditing of SMB file access Richard, I recently published a white paper in the Spectrum Scale wiki in developerworks about using Varonis with Spectrum Scale for auditing. This paper includes what type of file events are recognizable with the proposed setup. Here is link to the paper: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/fa32927c-e904-49cc-a4cc-870bcc8e307c/page/f0cc9b82-a133-41b4-83fe-3f560e95b35a/attachment/0ab62645-e0ab-4377-81e7-abd11879bb75/media/Spectrum_Scale_Varonis_Audit_Logging.pdf Note that you have to register with developerworks, but it is a free registration. Thanks, John John T. Olson, Ph.D., MI.C., K.EY. Master Inventor, Software Defined Storage 957/9032-1 Tucson, AZ, 85744 (520) 799-5185, tie 321-5185 (FAX: 520-799-4237) Email: jtolson at us.ibm.com "Do or do not. There is no try." - Yoda Olson's Razor: Any situation that we, as humans, can encounter in life can be modeled by either an episode of The Simpsons or Seinfeld. [Inactive hide details for Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help?]Aaron Knister ---12/09/2016 06:21:40 AM---Hi Richard, Does this help? From: Aaron Knister > To: gpfsug main discussion list > Date: 12/09/2016 06:21 AM Subject: Re: [gpfsug-discuss] Auditing of SMB file access Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Richard, Does this help? https://moiristo.wordpress.com/2009/08/10/samba-logging-user-activity/amp I've not used CES so I don't know at what level it manages the samba configuration file or how easily these changes could be integrated in light of that. Sent from my iPhone On Dec 9, 2016, at 6:52 AM, Sobey, Richard A > wrote: Hi all, Is there any auditing we can enable to track changes and accesses to files/folders on GPFS (via SMB/CES if that matters). Cheers Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From Mark.Bush at siriuscom.com Thu Dec 15 20:32:11 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 20:32:11 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 20:47:12 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 20:47:12 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Dec 15 20:52:17 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 15 Dec 2016 20:52:17 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu>, <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Dec 15 21:19:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 15 Dec 2016 21:19:20 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> Message-ID: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 15 21:25:21 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 15 Dec 2016 21:25:21 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> Message-ID: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Dec 17 04:24:34 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 16 Dec 2016 23:24:34 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name Message-ID: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Hi Everyone, I'm curious about the most straightforward and fastest way to identify what NSD a given /dev device is. The best I can come up with is "tspreparedisk -D device_name" which gives me something like: tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: that I can then parse and map the nsd id to the nsd name. I hesitate calling ts* commands directly and I admit it's perhaps an irrational fear, but I associate the -D flag with "delete" in my head and am afraid that some day -D may be just that and *poof* there go my NSD descriptors. Is there a cleaner way? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From erich at uw.edu Sat Dec 17 04:55:00 2016 From: erich at uw.edu (Eric Horst) Date: Fri, 16 Dec 2016 20:55:00 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: Perhaps this: mmlsnsd -m -Eric On Fri, Dec 16, 2016 at 8:24 PM, Aaron Knister wrote: > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational fear, > but I associate the -D flag with "delete" in my head and am afraid that > some day -D may be just that and *poof* there go my NSD descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Dec 17 07:04:08 2016 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 17 Dec 2016 07:04:08 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Sat Dec 17 08:35:05 2016 From: jtucker at pixitmedia.com (Jez Tucker) Date: Sat, 17 Dec 2016 08:35:05 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> References: <303ba835-5844-765f-c34d-a62c226498c5@arcastream.com> Message-ID: <6ebdd77b-c576-fbee-903c-c365e101cbb4@pixitmedia.com> Hi Aaron An alternative method for you is: from arcapix.fs.gpfs import Nsds >>> from arcapix.fs.gpfs import Nsds >>> nsd = Nsds() >>> for n in nsd.values(): ... print n.device, n.id ... /gpfsblock/mmfs1-md1 md3200_001_L000 /gpfsblock/mmfs1-md2 md3200_001_L001 /gpfsblock/mmfs1-data1 md3200_001_L002 /gpfsblock/mmfs1-data2 md3200_001_L003 /gpfsblock/mmfs1-data3 md3200_001_L004 /gpfsblock/mmfs2-md1 md3200_002_L000 Ref: http://arcapix.com/gpfsapi/nsds.html Obviously you can filter a specific device by the usual Pythonic string comparators. Jez On 17/12/16 07:04, Luis Bolinches wrote: > Hi > THe ts* is a good fear, they are internal commands bla bla bla you > know that > Have you tried mmlsnsd -X > > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > > Luis Bolinches > Lab Services > http://www-03.ibm.com/systems/services/labservices/ > > IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland > Phone: +358 503112585 > > "If you continually give you will continually have." Anonymous > > ----- Original message ----- > From: Aaron Knister > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [gpfsug-discuss] translating /dev device into nsd name > Date: Sat, Dec 17, 2016 6:24 AM > Hi Everyone, > > I'm curious about the most straightforward and fastest way to identify > what NSD a given /dev device is. The best I can come up with is > "tspreparedisk -D device_name" which gives me something like: > > tspreparedisk:0:0A6535145840E2A6:/dev/dm-134::::::0: > > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am > afraid > that some day -D may be just that and *poof* there go my NSD > descriptors. > > Is there a cleaner way? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* VP of Research and Development, ArcaStream jtucker at arcastream.com www.arcastream.com | Tw:@arcastream.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valdis.Kletnieks at vt.edu Sat Dec 17 21:42:39 2016 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Sat, 17 Dec 2016 16:42:39 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> References: <6bf32d20-2954-56c8-3c89-1ac8c6df3e34@nasa.gov> Message-ID: <54420.1482010959@turing-police.cc.vt.edu> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 484 bytes Desc: not available URL: From daniel.kidger at uk.ibm.com Mon Dec 19 11:42:03 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Mon, 19 Dec 2016 11:42:03 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discussUnless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Mon Dec 19 14:53:27 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Mon, 19 Dec 2016 14:53:27 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance Message-ID: We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of the IO servers phoned home with memory error. IBM is coming out today to replace the faulty DIMM. What is the correct way of taking this system out for maintenance? Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When we needed to do maintenance on the old system, we would migrate manager role and also move primary and secondary server roles if one of those systems had to be taken down. With ESS and resource pool manager roles etc. is there a correct way of shutting down one of the IO serves for maintenance? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Dec 19 15:15:45 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 19 Dec 2016 10:15:45 -0500 Subject: [gpfsug-discuss] Tiers In-Reply-To: <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> References: <57346592-97B2-4EF0-B4FF-C7CDA5FBB853@siriuscom.com> <21657385-50FB-4345-8E36-C128B24BF981@vanderbilt.edu> <0A049633-6AC2-4A78-B1A3-A69174C23A3D@siriuscom.com> <88171115-BFE2-488E-8F8A-CB29FC353459@vanderbilt.edu> Message-ID: We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Mark, > > We just use an 8 Gb FC SAN. For the data pool we typically have a dual > active-active controller storage array fronting two big RAID 6 LUNs and 1 > RAID 1 (for /home). For the capacity pool, it might be the same exact > model of controller, but the two controllers are now fronting that whole > 60-bay array. > > But our users tend to have more modest performance needs than most? > > Kevin > > On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: > > Kevin, out of curiosity, what type of disk does your data pool use? SAS > or just some SAN attached system? > > *From: * on behalf of > "Buterbaugh, Kevin L" > *Reply-To: *gpfsug main discussion list > *Date: *Thursday, December 15, 2016 at 2:47 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Tiers > > Hi Mark, > > We?re a ?traditional? university HPC center with a very untraditional > policy on our scratch filesystem ? we don?t purge it and we sell quota > there. Ultimately, a lot of that disk space is taken up by stuff that, > let?s just say, isn?t exactly in active use. > > So what we?ve done, for example, is buy a 60-bay storage array and stuff > it with 8 TB drives. It wouldn?t offer good enough performance for > actively used files, but we use GPFS policies to migrate files to the > ?capacity? pool based on file atime. So we have 3 pools: > > 1. the system pool with metadata only (on SSDs) > 2. the data pool, which is where actively used files are stored and which > offers decent performance > 3. the capacity pool, for data which hasn?t been accessed ?recently?, and > which is on slower storage > > I would imagine others do similar things. HTHAL? > > Kevin > > > On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: > > Just curious how many of you out there deploy SS with various tiers? It > seems like a lot are doing the system pool with SSD?s but do you routinely > have clusters that have more than system pool and one more tier? > > I know if you are doing Archive in connection that?s an obvious choice for > another tier but I?m struggling with knowing why someone needs more than > two tiers really. > > I?ve read all the fine manuals as to how to do such a thing and some of > the marketing as to maybe why. I?m still scratching my head on this > though. In fact, my understanding is in the ESS there isn?t any different > pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). > > It does make sense to me know with TCT and I could create an ILM policy to > get some of my data into the cloud. > > But in the real world I would like to know what yall do in this regard. > > > Thanks > > Mark > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > *Sirius Computer Solutions * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 19 15:25:52 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 19 Dec 2016 15:25:52 +0000 Subject: [gpfsug-discuss] Tiers Message-ID: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Mon Dec 19 15:30:58 2016 From: kenh at us.ibm.com (Ken Hill) Date: Mon, 19 Dec 2016 10:30:58 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" To: "gpfsug main discussion list" Cc: "gpfsug main discussion list" Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 15:36:50 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 15:36:50 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:40:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:40:50 +0000 Subject: [gpfsug-discuss] Tiers In-Reply-To: References: Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E@vanderbilt.edu> Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 15:53:12 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 15:53:12 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Move its recoverygrops to the other node by putting the other node as primary server for it: mmchrecoverygroup rgname --servers otherServer,thisServer And verify that it's now active on the other node by "mmlsrecoverygroup rgname -L". Move away any filesystem managers or cluster manager role if that's active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. Then you can run mmshutdown on it (assuming you also have enough quorum nodes in the remaining cluster). -jf man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 15:58:16 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 15:58:16 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpappas at dstonline.com Mon Dec 19 15:59:12 2016 From: bpappas at dstonline.com (Bill Pappas) Date: Mon, 19 Dec 2016 15:59:12 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] SMB issues Message-ID: Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg URL: From S.J.Thompson at bham.ac.uk Mon Dec 19 16:06:08 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 19 Dec 2016 16:06:08 +0000 Subject: [gpfsug-discuss] SMB issues Message-ID: We see it on all four of the nodes, and yet we did some getent passwd/getent group stuff on them to verify that identity is working OK. Simon From: > on behalf of Bill Pappas > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 19 December 2016 at 15:59 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] SMB issues What I would do is when you identify this issue again, determine which IP address (which samba server) is serving up the CIFS share. Then as root, log on to that samna node and typr "id " for the user which has this issue. Are they in all the security groups you'd expect, in particular, the group required to access the folder in question? Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] [http://www.prweb.com/releases/2016/06/prweb13504050.htm] http://www.prweb.com/releases/2016/06/prweb13504050.htm ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of gpfsug-discuss-request at spectrumscale.org > Sent: Monday, December 19, 2016 9:41 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 59, Issue 40 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SMB issues (Simon Thompson (Research Computing - IT Services)) 2. Re: Tiers (Buterbaugh, Kevin L) ---------------------------------------------------------------------- Message: 1 Date: Mon, 19 Dec 2016 15:36:50 +0000 From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SMB issues Message-ID: > Content-Type: text/plain; charset="us-ascii" Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon ------------------------------ Message: 2 Date: Mon, 19 Dec 2016 15:40:50 +0000 From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Message-ID: <25B04F2E-21FD-44EF-B15B-8317DE9EF68E at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi Brian, We?re probably an outlier on this (Bob?s case is probably much more typical) but we can get away with doing weekly migrations based on file atime. Some thoughts: 1. absolutely use QOS! It?s one of the best things IBM has ever added to GPFS. 2. personally, I limit even my capacity pool to no more than 98% capacity. I just don?t think it?s a good idea to 100% fill anything. 3. if you do use anything like atime or mtime as your criteria, don?t forget to have a rule to move stuff back from the capacity pool if it?s now being used. 4. we also help manage a DDN device and there they do also implement a rule to move stuff if the ?fast? pool exceeds a certain threshold ? but they use file size as the weight. Not saying that?s right or wrong, it?s just another approach. HTHAL? Kevin On Dec 19, 2016, at 9:25 AM, Oesterlin, Robert > wrote: I tend to do migration based on ?file heat?, moving the least active files to HDD and more active to SSD. Something simple like this: rule grpdef GROUP POOL gpool IS ssd LIMIT(75) THEN disk rule repack MIGRATE FROM POOL gpool TO POOL gpool WEIGHT(FILE_HEAT) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Monday, December 19, 2016 at 9:15 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Tiers We are in very similar situation. VT - ARC has a layer of SSD for metadata only, another layer of SSD for "hot" data, and a layer of 8TB HDDs for capacity. We just now in the process of getting it all into production. On this topic: What is everyone's favorite migration policy to move data from SSD to HDD (and vice versa)? Do you nightly move large/old files to HDD or wait until the fast tier hit some capacity limit? Do you use QOS to limit the migration from SSD to HDD i.e. try not to kill the file system with migration work? Thanks, Brian Marshall On Thu, Dec 15, 2016 at 4:25 PM, Buterbaugh, Kevin L > wrote: Hi Mark, We just use an 8 Gb FC SAN. For the data pool we typically have a dual active-active controller storage array fronting two big RAID 6 LUNs and 1 RAID 1 (for /home). For the capacity pool, it might be the same exact model of controller, but the two controllers are now fronting that whole 60-bay array. But our users tend to have more modest performance needs than most? Kevin On Dec 15, 2016, at 3:19 PM, Mark.Bush at siriuscom.com wrote: Kevin, out of curiosity, what type of disk does your data pool use? SAS or just some SAN attached system? From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Thursday, December 15, 2016 at 2:47 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Tiers Hi Mark, We?re a ?traditional? university HPC center with a very untraditional policy on our scratch filesystem ? we don?t purge it and we sell quota there. Ultimately, a lot of that disk space is taken up by stuff that, let?s just say, isn?t exactly in active use. So what we?ve done, for example, is buy a 60-bay storage array and stuff it with 8 TB drives. It wouldn?t offer good enough performance for actively used files, but we use GPFS policies to migrate files to the ?capacity? pool based on file atime. So we have 3 pools: 1. the system pool with metadata only (on SSDs) 2. the data pool, which is where actively used files are stored and which offers decent performance 3. the capacity pool, for data which hasn?t been accessed ?recently?, and which is on slower storage I would imagine others do similar things. HTHAL? Kevin On Dec 15, 2016, at 2:32 PM, Mark.Bush at siriuscom.com wrote: Just curious how many of you out there deploy SS with various tiers? It seems like a lot are doing the system pool with SSD?s but do you routinely have clusters that have more than system pool and one more tier? I know if you are doing Archive in connection that?s an obvious choice for another tier but I?m struggling with knowing why someone needs more than two tiers really. I?ve read all the fine manuals as to how to do such a thing and some of the marketing as to maybe why. I?m still scratching my head on this though. In fact, my understanding is in the ESS there isn?t any different pools (tiers) as it?s all NL-SAS or SSD (DF150, etc). It does make sense to me know with TCT and I could create an ILM policy to get some of my data into the cloud. But in the real world I would like to know what yall do in this regard. Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 59, Issue 40 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1466780990050_DSTlogo.png.png Type: image/png Size: 6282 bytes Desc: OutlookEmoji-1466780990050_DSTlogo.png.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg Type: image/jpeg Size: 14887 bytes Desc: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg URL: From ulmer at ulmer.org Mon Dec 19 16:16:56 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 11:16:56 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen > On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: > > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > >> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >> >> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >> >> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >> >> >> Ken Hill >> Technical Sales Specialist | Software Defined Solution Sales >> IBM Systems >> Phone:1-540-207-7270 >> E-mail: kenh at us.ibm.com >> >> >> 2300 Dulles Station Blvd >> Herndon, VA 20171-6133 >> United States >> >> >> >> >> >> >> >> >> >> >> From: "Daniel Kidger" > >> To: "gpfsug main discussion list" > >> Cc: "gpfsug main discussion list" > >> Date: 12/19/2016 06:42 AM >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Valdis wrote: >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> >> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >> >> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >> >> Daniel >> >> IBM Spectrum Storage Software >> +44 (0)7818 522266 >> Sent from my iPad using IBM Verse >> >> >> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >> >> From: Valdis.Kletnieks at vt.edu >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Date: 17 Dec 2016 21:43:00 >> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >> >> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >> > that I can then parse and map the nsd id to the nsd name. I hesitate >> > calling ts* commands directly and I admit it's perhaps an irrational >> > fear, but I associate the -D flag with "delete" in my head and am afraid >> > that some day -D may be just that and *poof* there go my NSD descriptors. >> Others have mentioned mmlsdnsd -m and -X >> Keep in mind that if you have multiple NSD servers in the cluster, there >> is *no* guarantee that the names for a device will be consistent across >> the servers, or across reboots. And when multipath is involved, you may >> have 4 or 8 or even more names for the same device.... >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Mon Dec 19 16:25:50 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 19 Dec 2016 16:25:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: References: <54420.1482010959@turing-police.cc.vt.edu> Message-ID: I normally do mmcrnsd without specifying any servers=, and point at the local /dev entry. Afterwards I add the servers= line and do mmchnsd. -jf man. 19. des. 2016 kl. 16.58 skrev Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu>: > Hi Ken, > > Umm, wouldn?t that make that server the primary NSD server for all those > NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen > server, but as long as you have the proper device name for the NSD from the > NSD server you want to be primary for it, I?ve never had a problem > specifying many different servers first in the list. > > Or am I completely misunderstanding what you?re saying? Thanks... > > Kevin > > On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: > > Indeed. It only matters when deploying NSDs. Post-deployment, all luns > (NSDs) are labeled - and they are assembled by GPFS. > > Keep in mind: If you are deploying multiple NSDs (with multiple servers) - > you'll need to pick one server to work with... Use that server to label the > luns (mmcrnsd)... In the nsd stanza file - the server you choose will need > to be the first server in the "servers" list. > > > *Ken Hill* > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > > ------------------------------ > *Phone:*1-540-207-7270 > * E-mail:* *kenh at us.ibm.com* > > > > > > > > > > > > > > > > > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > > > > > > From: "Daniel Kidger" > To: "gpfsug main discussion list" > > Cc: "gpfsug main discussion list" > > Date: 12/19/2016 06:42 AM > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > *Valdis wrote:* > > > > *Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across the > servers, or across reboots. And when multipath is involved, you may have 4 > or 8 or even more names for the same device....* > > Indeed the is whole greatness about NSDs (and in passing why Lustre can be > much more tricky to safely manage.) > Once a lun is "labelled" as an NSD then that NSD name is all you need to > care about as the /dev entries can now freely change on reboot or differ > across nodes. Indeed if you connect an arbitrary node to an NSD disk via a > SAN cable, gpfs will recognise it and use it as a shortcut to that lun. > > Finally recall that in the NSD stanza file the /dev entry is only matched > for on the first of the listed NSD servers; the other NSD servers will > discover and learn which NSD this is, ignoring the /dev value in this > stanza. > > Daniel > > IBM Spectrum Storage Software > *+44 (0)7818 522266* <+44%207818%20522266> > Sent from my iPad using IBM Verse > > > ------------------------------ > On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: > > From: Valdis.Kletnieks at vt.edu > To: gpfsug-discuss at spectrumscale.org > Cc: > Date: 17 Dec 2016 21:43:00 > Subject: Re: [gpfsug-discuss] translating /dev device into nsd name > > On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > > that I can then parse and map the nsd id to the nsd name. I hesitate > > calling ts* commands directly and I admit it's perhaps an irrational > > fear, but I associate the -D flag with "delete" in my head and am afraid > > that some day -D may be just that and *poof* there go my NSD descriptors. > Others have mentioned mmlsdnsd -m and -X > Keep in mind that if you have multiple NSD servers in the cluster, there > is *no* guarantee that the names for a device will be consistent across > the servers, or across reboots. And when multipath is involved, you may > have 4 or 8 or even more names for the same device.... > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 19 16:43:50 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 19 Dec 2016 16:43:50 +0000 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> Message-ID: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Hi Stephen, Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: 1. go down to the data center and sit in front of the storage arrays. 2. log on to the NSD server I want to be primary for a given NSD. 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. 3. for the remaining disks, run ?dd if=/dev/> wrote: Your observation is correct! There?s usually another step, though: mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. -- Stephen On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: Hi Ken, Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. Or am I completely misunderstanding what you?re saying? Thanks... Kevin On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Daniel Kidger" > To: "gpfsug main discussion list" > Cc: "gpfsug main discussion list" > Date: 12/19/2016 06:42 AM Subject: Re: [gpfsug-discuss] translating /dev device into nsd name Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Valdis wrote: Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. Daniel IBM Spectrum Storage Software +44 (0)7818 522266 Sent from my iPad using IBM Verse ________________________________ On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: From: Valdis.Kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Cc: Date: 17 Dec 2016 21:43:00 Subject: Re: [gpfsug-discuss] translating /dev device into nsd name On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: > that I can then parse and map the nsd id to the nsd name. I hesitate > calling ts* commands directly and I admit it's perhaps an irrational > fear, but I associate the -D flag with "delete" in my head and am afraid > that some day -D may be just that and *poof* there go my NSD descriptors. Others have mentioned mmlsdnsd -m and -X Keep in mind that if you have multiple NSD servers in the cluster, there is *no* guarantee that the names for a device will be consistent across the servers, or across reboots. And when multipath is involved, you may have 4 or 8 or even more names for the same device.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Dec 19 16:45:38 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 19 Dec 2016 16:45:38 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: Can you create an export with "admin user" and see if the issue is reproducible that way: Mmsmb export add exportname /path/to/folder Mmsmb export change exportname -option "admin users=username at domain" And for good measure remove the SID of Domain Users from the ACL: mmsmb exportacl remove exportname --SID S-1-1-0 I can't quite think in my head how this will help but I'd be interested to know if you see similar behaviour. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: 19 December 2016 15:37 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SMB issues Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Mon Dec 19 17:08:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 19 Dec 2016 12:08:27 -0500 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <14903A9D-B051-4B1A-AF83-31140FC7666D@ulmer.org> Depending on the hardware?. ;) Sometimes you can use the drivers to tell you the ?volume name? of a LUN on the storage server. You could do that the DS{3,4,5}xx systems. I think you can also do it for Storwize-type systems, but I?m blocking on how and I don?t have one in front of me at the moment. Either that or use the volume UUID or some such. I?m basically never where I can see the blinky lights. :( -- Stephen > On Dec 19, 2016, at 11:43 AM, Buterbaugh, Kevin L > wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer > wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L > wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill > wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" > >>>> To: "gpfsug main discussion list" > >>>> Cc: "gpfsug main discussion list" > >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Dec 19 17:16:07 2016 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 19 Dec 2016 09:16:07 -0800 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: We have each of our NSDs on boxes shared between two servers, with one server primary for each raid unit. When I create Logical drives and map them, I make sure there is no overlap in the logical unit numbers between the two boxes. Then I use /proc/partitions and lsscsi to see if they all show up. When it is time to write the stanza files, I use multipath -ll to get a list with the device name and LUN info, and sort it to round robin over all the NSD servers. It's still tedious, but it doesn't require a trip to the machine room. Note that the multipath -ll command needs to be run separately on each NSD server to get the device name specific to that host -- the first server name in the list. Also realize that leaving the host name off when creating NSDs only works if all the drives are visible from the node where you run the command. Regards, -- ddj Dave Johnson > On Dec 19, 2016, at 8:43 AM, Buterbaugh, Kevin L wrote: > > Hi Stephen, > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > > Kevin > >> On Dec 19, 2016, at 10:16 AM, Stephen Ulmer wrote: >> >> Your observation is correct! There?s usually another step, though: >> >> mmcrnsd creates each NSD on the first server in the list, so if you ?stripe? the servers you have to know the device name for that NSD on the node that is first in the server list for that NSD. It is usually less work to pick one node, create the NSDs and then change them to have a different server order. >> >> -- >> Stephen >> >> >> >>> On Dec 19, 2016, at 10:58 AM, Buterbaugh, Kevin L wrote: >>> >>> Hi Ken, >>> >>> Umm, wouldn?t that make that server the primary NSD server for all those NSDs? Granted, you run the mmcrnsd command from one arbitrarily chosen server, but as long as you have the proper device name for the NSD from the NSD server you want to be primary for it, I?ve never had a problem specifying many different servers first in the list. >>> >>> Or am I completely misunderstanding what you?re saying? Thanks... >>> >>> Kevin >>> >>>> On Dec 19, 2016, at 9:30 AM, Ken Hill wrote: >>>> >>>> Indeed. It only matters when deploying NSDs. Post-deployment, all luns (NSDs) are labeled - and they are assembled by GPFS. >>>> >>>> Keep in mind: If you are deploying multiple NSDs (with multiple servers) - you'll need to pick one server to work with... Use that server to label the luns (mmcrnsd)... In the nsd stanza file - the server you choose will need to be the first server in the "servers" list. >>>> >>>> >>>> Ken Hill >>>> Technical Sales Specialist | Software Defined Solution Sales >>>> IBM Systems >>>> Phone:1-540-207-7270 >>>> E-mail: kenh at us.ibm.com >>>> >>>> >>>> 2300 Dulles Station Blvd >>>> Herndon, VA 20171-6133 >>>> United States >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: "Daniel Kidger" >>>> To: "gpfsug main discussion list" >>>> Cc: "gpfsug main discussion list" >>>> Date: 12/19/2016 06:42 AM >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Valdis wrote: >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> >>>> Indeed the is whole greatness about NSDs (and in passing why Lustre can be much more tricky to safely manage.) >>>> Once a lun is "labelled" as an NSD then that NSD name is all you need to care about as the /dev entries can now freely change on reboot or differ across nodes. Indeed if you connect an arbitrary node to an NSD disk via a SAN cable, gpfs will recognise it and use it as a shortcut to that lun. >>>> >>>> Finally recall that in the NSD stanza file the /dev entry is only matched for on the first of the listed NSD servers; the other NSD servers will discover and learn which NSD this is, ignoring the /dev value in this stanza. >>>> >>>> Daniel >>>> >>>> IBM Spectrum Storage Software >>>> +44 (0)7818 522266 >>>> Sent from my iPad using IBM Verse >>>> >>>> >>>> On 17 Dec 2016, 21:43:00, Valdis.Kletnieks at vt.edu wrote: >>>> >>>> From: Valdis.Kletnieks at vt.edu >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: >>>> Date: 17 Dec 2016 21:43:00 >>>> Subject: Re: [gpfsug-discuss] translating /dev device into nsd name >>>> >>>> On Fri, 16 Dec 2016 23:24:34 -0500, Aaron Knister said: >>>> > that I can then parse and map the nsd id to the nsd name. I hesitate >>>> > calling ts* commands directly and I admit it's perhaps an irrational >>>> > fear, but I associate the -D flag with "delete" in my head and am afraid >>>> > that some day -D may be just that and *poof* there go my NSD descriptors. >>>> Others have mentioned mmlsdnsd -m and -X >>>> Keep in mind that if you have multiple NSD servers in the cluster, there >>>> is *no* guarantee that the names for a device will be consistent across >>>> the servers, or across reboots. And when multipath is involved, you may >>>> have 4 or 8 or even more names for the same device.... >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> Unless stated otherwise above: >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598. >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Dec 19 17:31:38 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 19 Dec 2016 10:31:38 -0700 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: Message-ID: >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tortay at cc.in2p3.fr Mon Dec 19 17:49:05 2016 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Mon, 19 Dec 2016 18:49:05 +0100 Subject: [gpfsug-discuss] translating /dev device into nsd name In-Reply-To: <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> References: <54420.1482010959@turing-police.cc.vt.edu> <97E42964-CAB6-4856-9FF0-FDC95416EE3C@ulmer.org> <29F290EB-8DC2-4A1B-AE9A-7498512C5075@vanderbilt.edu> Message-ID: <5cca2ea8-b098-c1e4-ab03-9542837287ab@cc.in2p3.fr> On 12/19/2016 05:43 PM, Buterbaugh, Kevin L wrote: > > Right - that?s what I meant by having the proper device name for the NSD from the NSD server you want to be primary for it. Thanks for confirming that for me. > > This discussion prompts me to throw out a related question that will in all likelihood be impossible to answer since it is hardware dependent, AFAIK. But in case I?m wrong about that, I?ll ask. ;-) > > My method for identifying the correct ?/dev? device to pass to mmcrnsd has been to: > > 1. go down to the data center and sit in front of the storage arrays. > 2. log on to the NSD server I want to be primary for a given NSD. > 2. use ?fdisk -l? to get a list of the disks the NSD server sees and eliminate any that don?t match with the size of the NSD(s) being added. > 3. for the remaining disks, run ?dd if=/dev/ > Is there a better way? Thanks... > Hello, We use device mapper/multipath to assign meaningful names to devices based on the WWN (or the storage system "volume" name) of the LUNs. We use a simple naming scheme ("nsdDDNN", where DD is the primary server number and NN the NSD number for that node, of course all NSDs are served by at least 2 nodes). When possible, these names are also used by the storage systems (nowadays mostly LSI/Netapp units). We have scripts to automate the configuration of the LUNs on the storage systems with the proper names as well as for creating the relevant section of "multipath.conf". There is no ambiguity during "mmcrnsd" (or no need to use "mmchnsd" later on) and it's also easy to know which filesystem or pool is at risk when some hardware fails (CMDB, etc.) Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From mimarsh2 at vt.edu Tue Dec 20 13:57:31 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 08:57:31 -0500 Subject: [gpfsug-discuss] mmlsdisk performance impact Message-ID: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 14:03:07 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 14:03:07 +0000 Subject: [gpfsug-discuss] mmlsdisk performance impact In-Reply-To: References: Message-ID: Hi Brian, If I?m not mistaken, once you run the mmlsdisk command on one client any other client running it will produce the exact same output. Therefore, what we do is run it once, output that to a file, and propagate that file to any node that needs it. HTHAL? Kevin On Dec 20, 2016, at 7:57 AM, Brian Marshall > wrote: All, Does the mmlsdisk command generate a lot of admin traffic or take up a lot of GPFS resources? In our case, we have it in some of our monitoring routines that run on all nodes. It is kind of nice info to have, but I am wondering if hitting the filesystem with a bunch of mmlsdisk commands is bad for performance. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 16:25:04 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 11:25:04 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process Message-ID: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 16:27:32 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 08:27:32 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: Message-ID: <20161220162732.GB20276@illiuin> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 16:35:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 10:35:44 -0600 Subject: [gpfsug-discuss] LROC Message-ID: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 16:37:54 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 16:37:54 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <20161220162732.GB20276@illiuin> References: <20161220162732.GB20276@illiuin> Message-ID: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue Dec 20 17:03:28 2016 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 20 Dec 2016 17:03:28 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:07:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:07:17 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Brian, > > It would be helpful to know what scheduling software, if any, you use. > > We were a PBS / Moab shop for a number of years but switched to SLURM two > years ago. With both you can configure the maximum amount of memory > available to all jobs on a node. So we just simply ?reserve? however much > we need for GPFS and other ?system? processes. > > I can tell you that SLURM is *much* more efficient at killing processes as > soon as they exceed the amount of memory they?ve requested than PBS / Moab > ever dreamed of being. > > Kevin > > On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: > > We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > process memory > usage. In the GE exec host configuration, we reserve 4GB for the OS > (including GPFS) so jobs are not able to consume all the physical memory on > the system. > > On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > > All, > > What is your favorite method for stopping a user process from eating up all > the system memory and saving 1 GB (or more) for the GPFS / system > processes? We have always kicked around the idea of cgroups but never > moved on it. > > The problem: A user launches a job which uses all the memory on a node, > which causes the node to be expelled, which causes brief filesystem > slowness everywhere. > > I bet this problem has already been solved and I am just googling the wrong > search terms. > > > Thanks, > Brian > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 20 17:13:48 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 20 Dec 2016 17:13:48 +0000 Subject: [gpfsug-discuss] SMB issues In-Reply-To: References: , Message-ID: Nope, just lots of messages with the same error, but different folders. I've opened a pmr with IBM and supplied the usual logs. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Christof Schmitt [christof.schmitt at us.ibm.com] Sent: 19 December 2016 17:31 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SMB issues >From this message, it does not look like a known problem. Are there other messages leading up to the one you mentioned? I would suggest reporting this through a PMR. Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/19/2016 08:37 AM Subject: [gpfsug-discuss] SMB issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We upgraded to 4.2.2.0 last week as well as to gpfs.smb-4.4.6_gpfs_8-1.el7.x86_64.rpm from the 4.2.2.0 protocols bundle. We've since been getting random users reporting that they get access denied errors when trying to access folders. Some seem to work fine and others not, but it seems to vary and change by user (for example this morning, I could see all my folders fine, but later I could only see some). From my Mac connecting to the SMB shares, I could connect fine to the share, but couldn't list files in the folder (I guess this is what users were seeing from Windows as access denied). In the log.smbd, we are seeing errors such as this: [2016/12/19 15:20:40.649580, 0] ../source3/lib/sysquotas.c:457(sys_get_quota) sys_path_to_bdev() failed for path [FOLDERNAME_HERE]! Reverting to the previous version of SMB we were running (gpfs.smb-4.3.9_gpfs_21-1.el7.x86_64), the problems go away. Before I log a PMR, has anyone else seen this behaviour or have any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 20 17:15:02 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 20 Dec 2016 17:15:02 +0000 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <818353BF-18AC-4931-8890-35D6ECC4DF04@vanderbilt.edu> Hi Brian, I don?t *think* you can entirely solve this problem with Moab ? as I mentioned, it?s not nearly as efficient as SLURM is at killing jobs when they exceed requested memory. We had situations where a user would be able to run a node out of memory before Moab would kill it. Hasn?t happened once with SLURM, AFAIK. But with either Moab or SLURM what we?ve done is taken the amount of physical RAM in the box and subtracted from that the amount of memory we want to ?reserve? for the system (OS, GPFS, etc.) and then told Moab / SLURM that this is how much RAM the box has. That way they at least won?t schedule jobs on the node that would exceed available memory. HTH? Kevin On Dec 20, 2016, at 11:07 AM, Brian Marshall > wrote: We use adaptive - Moab torque right now but are thinking about going to Skyrim Brian On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" > wrote: Hi Brian, It would be helpful to know what scheduling software, if any, you use. We were a PBS / Moab shop for a number of years but switched to SLURM two years ago. With both you can configure the maximum amount of memory available to all jobs on a node. So we just simply ?reserve? however much we need for GPFS and other ?system? processes. I can tell you that SLURM is *much* more efficient at killing processes as soon as they exceed the amount of memory they?ve requested than PBS / Moab ever dreamed of being. Kevin On Dec 20, 2016, at 10:27 AM, Skylar Thompson > wrote: We're a Grid Engine shop, and use cgroups (m_mem_free) to control user process memory usage. In the GE exec host configuration, we reserve 4GB for the OS (including GPFS) so jobs are not able to consume all the physical memory on the system. On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: All, What is your favorite method for stopping a user process from eating up all the system memory and saving 1 GB (or more) for the GPFS / system processes? We have always kicked around the idea of cgroups but never moved on it. The problem: A user launches a job which uses all the memory on a node, which causes the node to be expelled, which causes brief filesystem slowness everywhere. I bet this problem has already been solved and I am just googling the wrong search terms. Thanks, Brian _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Tue Dec 20 17:15:23 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 20 Dec 2016 12:15:23 -0500 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: Skyrim equals Slurm. Mobile shenanigans. Brian On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > We use adaptive - Moab torque right now but are thinking about going to > Skyrim > > Brian > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > Kevin.Buterbaugh at vanderbilt.edu> wrote: > >> Hi Brian, >> >> It would be helpful to know what scheduling software, if any, you use. >> >> We were a PBS / Moab shop for a number of years but switched to SLURM two >> years ago. With both you can configure the maximum amount of memory >> available to all jobs on a node. So we just simply ?reserve? however much >> we need for GPFS and other ?system? processes. >> >> I can tell you that SLURM is *much* more efficient at killing processes >> as soon as they exceed the amount of memory they?ve requested than PBS / >> Moab ever dreamed of being. >> >> Kevin >> >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson >> wrote: >> >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user >> process memory >> usage. In the GE exec host configuration, we reserve 4GB for the OS >> (including GPFS) so jobs are not able to consume all the physical memory >> on >> the system. >> >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: >> >> All, >> >> What is your favorite method for stopping a user process from eating up >> all >> the system memory and saving 1 GB (or more) for the GPFS / system >> processes? We have always kicked around the idea of cgroups but never >> moved on it. >> >> The problem: A user launches a job which uses all the memory on a node, >> which causes the node to be expelled, which causes brief filesystem >> slowness everywhere. >> >> I bet this problem has already been solved and I am just googling the >> wrong >> search terms. >> >> >> Thanks, >> Brian >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> -- >> -- Skylar Thompson (skylar2 at u.washington.edu) >> -- Genome Sciences Department, System Administrator >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> >> -- University of Washington School of Medicine >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Dec 20 17:19:48 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 20 Dec 2016 17:19:48 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: For sake of everyone else on this listserv, I'll highlight the appropriate procedure here. It turns out, changing recovery group on an active system is not recommended by IBM. We tried following Jan's recommendation this morning, and the system became unresponsive for about 30 minutes. It only became responsive (and recovery group change finished) after we killed couple of processes (ssh and scp) going to couple of clients. I got a Sev. 1 with IBM opened and they tell me that appropriate steps for IO maintenance are as follows: 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) 2. unmount gpfs on io node that is going down 3. shutdown gpfs on io node that is going down 4. shutdown os That's it - recovery groups should not be changed. If there is a need to change recovery group, use --active option (not permanent change). We are now stuck in situation that io2 server is owner of both recovery groups. The way IBM tells us to fix this is to unmount the filesystem on all clients and change recovery groups then. We can't do it now and will have to schedule maintenance sometime in 2017. For now, we have switched recovery groups using --active flag and things (filesystem performance) seems to be OK. Load average on both io servers is quite high (250avg) and does not seem to be going down. I really wish that maintenance procedures were documented somewhere on IBM website. This experience this morning has really shaken my confidence in ESS. Damir On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust wrote: > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at u.washington.edu Tue Dec 20 17:18:35 2016 From: skylar2 at u.washington.edu (Skylar Thompson) Date: Tue, 20 Dec 2016 09:18:35 -0800 Subject: [gpfsug-discuss] reserving memory for GPFS process In-Reply-To: References: <20161220162732.GB20276@illiuin> <35704A59-85DB-40CB-BEAE-1106C5DA7E13@vanderbilt.edu> Message-ID: <20161220171834.GE20276@illiuin> When using m_mem_free on GE with cgroup=true, GE just depends on the kernel OOM killer. There's one killer per cgroup so when a job goes off the rails, only its processes are eligible for OOM killing. I'm not sure how Slurm does it but anything that uses cgroups should have the above behavior. On Tue, Dec 20, 2016 at 12:15:23PM -0500, Brian Marshall wrote: > Skyrim equals Slurm. Mobile shenanigans. > > Brian > > On Dec 20, 2016 12:07 PM, "Brian Marshall" wrote: > > > We use adaptive - Moab torque right now but are thinking about going to > > Skyrim > > > > Brian > > > > On Dec 20, 2016 11:38 AM, "Buterbaugh, Kevin L" < > > Kevin.Buterbaugh at vanderbilt.edu> wrote: > > > >> Hi Brian, > >> > >> It would be helpful to know what scheduling software, if any, you use. > >> > >> We were a PBS / Moab shop for a number of years but switched to SLURM two > >> years ago. With both you can configure the maximum amount of memory > >> available to all jobs on a node. So we just simply ???reserve??? however much > >> we need for GPFS and other ???system??? processes. > >> > >> I can tell you that SLURM is *much* more efficient at killing processes > >> as soon as they exceed the amount of memory they???ve requested than PBS / > >> Moab ever dreamed of being. > >> > >> Kevin > >> > >> On Dec 20, 2016, at 10:27 AM, Skylar Thompson > >> wrote: > >> > >> We're a Grid Engine shop, and use cgroups (m_mem_free) to control user > >> process memory > >> usage. In the GE exec host configuration, we reserve 4GB for the OS > >> (including GPFS) so jobs are not able to consume all the physical memory > >> on > >> the system. > >> > >> On Tue, Dec 20, 2016 at 11:25:04AM -0500, Brian Marshall wrote: > >> > >> All, > >> > >> What is your favorite method for stopping a user process from eating up > >> all > >> the system memory and saving 1 GB (or more) for the GPFS / system > >> processes? We have always kicked around the idea of cgroups but never > >> moved on it. > >> > >> The problem: A user launches a job which uses all the memory on a node, > >> which causes the node to be expelled, which causes brief filesystem > >> slowness everywhere. > >> > >> I bet this problem has already been solved and I am just googling the > >> wrong > >> search terms. > >> > >> > >> Thanks, > >> Brian > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> -- > >> -- Skylar Thompson (skylar2 at u.washington.edu) > >> -- Genome Sciences Department, System Administrator > >> -- Foege Building S046, (206)-685-7354 <(206)%20685-7354> > >> -- University of Washington School of Medicine > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > >> > >> > >> ??? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > >> Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > >> > >> > >> > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From mweil at wustl.edu Tue Dec 20 19:13:46 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 13:13:46 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more > metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Dec 20 19:18:47 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 20 Dec 2016 19:18:47 +0000 Subject: [gpfsug-discuss] LROC Message-ID: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSize configuration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Dec 20 19:36:08 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 20 Dec 2016 20:36:08 +0100 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: I'm sorry for your trouble, but those 4 steps you got from IBM support does not seem correct. IBM support might not always realize that it's an ESS, and not plain GPFS... If you take down an ESS IO-node without moving its RG to the other node using "--servers othernode,thisnode", or by using --active (which I've never used), you'll take down the whole recoverygroup and need to suffer an uncontrolled failover. Such an uncontrolled failover takes a few minutes of filesystem hang, while a controlled failover should not hang the system. I don't see why it's a problem that you now have an IO server that is owning both recoverygroups. Once your maintenance of the first IO servers is done, I would just revert the --servers order of that recovergroup, and it should move back. The procedure to move RGs around during IO node maintenance is documented on page 10 the quick deployment guide (step 1-3): http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc -jf On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic wrote: > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > >> >> Move its recoverygrops to the other node by putting the other node as >> primary server for it: >> >> mmchrecoverygroup rgname --servers otherServer,thisServer >> >> And verify that it's now active on the other node by "mmlsrecoverygroup >> rgname -L". >> >> Move away any filesystem managers or cluster manager role if that's >> active on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. >> >> Then you can run mmshutdown on it (assuming you also have enough quorum >> nodes in the remaining cluster). >> >> >> -jf >> >> man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : >> >> We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of >> the IO servers phoned home with memory error. IBM is coming out today to >> replace the faulty DIMM. >> >> What is the correct way of taking this system out for maintenance? >> >> Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When >> we needed to do maintenance on the old system, we would migrate manager >> role and also move primary and secondary server roles if one of those >> systems had to be taken down. >> >> With ESS and resource pool manager roles etc. is there a correct way of >> shutting down one of the IO serves for maintenance? >> >> Thanks, >> Damir >> >> >> _______________________________________________ >> >> gpfsug-discuss mailing list >> >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Dec 20 20:30:04 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 21:30:04 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue Dec 20 20:44:44 2016 From: mweil at wustl.edu (Matt Weil) Date: Tue, 20 Dec 2016 14:44:44 -0600 Subject: [gpfsug-discuss] CES ifs-ganashe Message-ID: Does ganashe have a default read and write max size? if so what is it? Thanks Matt From olaf.weiser at de.ibm.com Tue Dec 20 21:06:44 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 20 Dec 2016 22:06:44 +0100 Subject: [gpfsug-discuss] CES ifs-ganashe In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Tue Dec 20 23:25:41 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 21 Dec 2016 08:25:41 +0900 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <330C796A-EA04-45B0-B06C-CB8FB9E8E347@nuance.com> Message-ID: I still see the following statement* regarding with the use of LROC in FAQ (URL #1). Are there any issues anticipated to use LROC on protocol nodes? Q8.3: What are some configuration considerations when deploying the protocol functionality? A8.3: Configuration considerations include: (... many lines are snipped ...) Several GPFS configuration aspects have not been explicitly tested with the protocol function: (... many lines are snipped ...) Local Read Only Cache* (... many lines are snipped ...) Q2.25: What are the current requirements when using local read-only cache? A2.25: The current requirements/limitations for using local read-only cache include: - A minimum of IBM Spectrum Scale V4.1.0.1. - Local read-only cache is only supported on Linux x86 and Power. - The minimum size of a local read-only cache device is 4 GB. - The local read-only cache requires memory equal to 1% of the local read-only device's capacity. Note: Use of local read-only cache does not require a server license [1] IBM Spectrum Scale? Frequently Asked Questions and Answers (November 2016) https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: "Olaf Weiser" To: gpfsug main discussion list Date: 2016/12/21 05:31 Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org it's all true and right, but please have in mind.. with MFTC and the number of nodes in the ( remote and local ) cluster, you 'll need token mem since R42 token Mem is allocated automatically .. so the old tokenMEMLimit is more or less obsolete.. but you should have your overall configuration in mind, when raising MFTC clusterwide... just a hint.. have fun... Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 12/20/2016 08:19 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org We?re currently deploying LROC in many of our compute nodes ? results so far have been excellent. We?re putting in 240gb SSDs, because we have mostly small files. As far as I know, the amount of inodes and directories in LROC are not limited, except by the size of the cache disk. Look at these config options for LROC: lrocData Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved. Valid values are yes or no. The default value is yes. If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using thelrocDataMaxFileSize and lrocDataStubFileSizeconfiguration options to save all data from small files or all data from the initial portion of large files. lrocDataMaxFileSize Limits the data that may be saved in the local read-only cache to only the data from small files. A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is 0. lrocDataStubFileSize Limits the data that may be saved in the local read-only cache to only the data from the first portion of all files. A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is 0. lrocDirectories Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks. Valid values are yes or no. The default value is yes. lrocInodes Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data. Valid values are yes or no. The default value is yes. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Matt Weil Reply-To: gpfsug main discussion list Date: Tuesday, December 20, 2016 at 1:13 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] LROC as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Dec 21 09:23:16 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 09:23:16 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil wrote: > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil wrote: > > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Dec 21 09:42:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 21 Dec 2016 09:42:36 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Ooh, LROC sensors for Zimon? must look into that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sven Oehme Sent: 21 December 2016 09:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:29:04 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:29:04 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil > wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil > wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:37:46 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:37:46 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Wed Dec 21 11:48:24 2016 From: p.childs at qmul.ac.uk (Peter Childs) Date: Wed, 21 Dec 2016 11:48:24 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: , Message-ID: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Wed Dec 21 11:57:39 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 21 Dec 2016 11:57:39 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as > its the only place under Linux its used and should be set low on non-LROC > enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use > LROC on the same node. LROC uses the compact object (StatCache) to store > its pointer to the full file Object which is stored on the LROC device. so > on a call for attributes that are not in the StatCache the object gets > recalled from LROC and converted back into a full File Object, which is why > you still need to have a reasonable maxFiles setting even you use LROC as > you otherwise constantly move file infos in and out of LROC and put the > device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs p.childs at qmul.ac.uk>> wrote: > My understanding was the maxStatCache was only used on AIX and should be > set low on Linux, as raising it did't help and wasted resources. Are we > saying that LROC now uses it and setting it low if you raise > maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org> < > gpfsug-discuss-bounces at spectrumscale.org gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Sven Oehme < > oehmes at gmail.com> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File > object (maxFilesToCache) to a StatCache Object when it moves the content to > the LROC device. > therefore the only thing you really need to increase is maxStatCache on > the LROC node, but you still need maxFiles Objects, so leave that untouched > and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have > enough memory to hold tokens for all the objects you want to cache, but if > the memory is there and you have enough its well worth spend a lot of > memory on it and bump maxStatCache to a high number. i have tested > maxStatCache up to 16 million at some point per node, but if nodes with > this large amount of inodes crash or you try to shut them down you have > some delays , therefore i suggest you stay within a 1 or 2 million per > node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get > comparable stats, i suggest you setup Zimon and enable the Lroc sensors to > have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil mweil at wustl.edu>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the > files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil mweil at wustl.edu>>> wrote: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > < > https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org< > http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:12:22 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:12:22 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jez.tucker at gpfsug.org Wed Dec 21 12:16:03 2016 From: jez.tucker at gpfsug.org (Jez Tucker) Date: Wed, 21 Dec 2016 12:16:03 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: > > Does anyone know when the presentations from the last users group > meeting will be posted. I checked last night but there doesn?t seem > to be any new ones out there (summaries of talks yet). > > Thanks > > Mark > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, > and exempt from disclosure under applicable law. If you are not the > intended recipient, you are hereby notified that any use, > dissemination, distribution, or copying of this communication is > strictly prohibited. This message may be viewed by parties at Sirius > Computer Solutions other than those named in the message header. This > message does not contain an official representation of Sirius Computer > Solutions. If you have received this communication in error, notify > Sirius Computer Solutions immediately and (i) destroy this message if > a facsimile or (ii) delete this message immediately if this is an > electronic communication. Thank you. > > Sirius Computer Solutions > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 12:24:34 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 12:24:34 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> <46086ce7-236d-0e2c-8cf7-1021ce1e47ba@gpfsug.org> Message-ID: Yes From: Jez Tucker Reply-To: "jez.tucker at gpfsug.org" , gpfsug main discussion list Date: Wednesday, December 21, 2016 at 6:16 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Presentations from last UG Hi Are you referring to the UG at Salt Lake? If so I should be uploading these today/tomorrow. I'll send a ping out when done. We do not have the presentations from the mini-UG at Computing Insights as yet. (peeps, please send them in) Best, Jez On 21/12/16 12:12, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Wed Dec 21 12:46:42 2016 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Wed, 21 Dec 2016 12:46:42 +0000 Subject: [gpfsug-discuss] Presentations from last UG Message-ID: Checking... Kristy On Dec 21, 2016 7:12 AM, Mark.Bush at siriuscom.com wrote: Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 21 13:42:02 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 21 Dec 2016 13:42:02 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Wed Dec 21 14:37:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 21 Dec 2016 14:37:58 +0000 Subject: [gpfsug-discuss] Presentations from last UG In-Reply-To: References: <1A4B6353-49B4-4347-94B3-343B00960A9A@siriuscom.com> Message-ID: Thanks much, Simon. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Wednesday, December 21, 2016 at 7:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Presentations from last UG Sorry, my bad, it was on my todo list. The ones we have are now up online. http://www.spectrumscale.org/presentations/ Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 21 December 2016 12:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Presentations from last UG Does anyone know when the presentations from the last users group meeting will be posted. I checked last night but there doesn?t seem to be any new ones out there (summaries of talks yet). Thanks Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Dec 21 15:17:27 2016 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 21 Dec 2016 10:17:27 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen > On Dec 21, 2016, at 6:57 AM, Sven Oehme > wrote: > > its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). > on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . > > Sven > > On Wed, Dec 21, 2016 at 12:48 PM Peter Childs > wrote: > So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. > > Fine just good to know, nice and easy now with nodeclasses.... > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Sven Oehme > > Sent: Wednesday, December 21, 2016 11:37:46 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. > > sven > > > > On Wed, Dec 21, 2016 at 12:29 PM Peter Childs >> wrote: > My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. > > > Peter Childs > > > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > >> on behalf of Sven Oehme >> > Sent: Wednesday, December 21, 2016 9:23:16 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] LROC > > Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. > therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat > > Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. > i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. > > Sven > > On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >>>> wrote: > > as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 40000 > > do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > how much files do you want to cache ? > and do you only want to cache metadata or also data associated to the files ? > > sven > > > > On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >>>> wrote: > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage > > > Hello all, > > Are there any tuning recommendations to get these to cache more metadata? > > Thanks > > Matt > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Dec 21 15:39:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 21 Dec 2016 16:39:16 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: close, but not 100% :-) LROC only needs a StatCache Object for files that don't have a Full OpenFile (maxFilestoCache) Object and you still want to be able to hold Metadata and/or Data in LROC. e.g. you can have a OpenFile instance that has Data blocks in LROC, but no Metadata (as everything is in the OpenFile Object itself), then you don't need a maxStatCache Object for this one. but you would need a StatCache object if we have to throw this file metadata or data out of the FileCache and/or Pagepool as we would otherwise loose all references to that file in LROC. the MaxStat Object is the most compact form to hold only references to the real data. if its still unclear we might have to do a small writeup in form of a paper with a diagram to better explain it, but that would take a while due to a lot of other work ahead of that :-) sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Stephen Ulmer To: gpfsug main discussion list Date: 12/21/2016 04:17 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, I?ve read this several times, and it will help me to re-state it. Please tell me if this is not what you meant: You often see even common operations (like ls) blow out the StatCache, and things are inefficient when the StatCache is in use but constantly overrun. Because of this, you normally recommend disabling the StatCache with maxStatCache=0, and instead spend the memory normally used for StatCache on the FileCache. In the case of LROC, there *must* be a StatCache entry for every file that is held in the LROC. In this case, we want to set maxStatCache at least as large as the number of files whose data or metadata we?d like to be in the LROC. Close? -- Stephen On Dec 21, 2016, at 6:57 AM, Sven Oehme wrote: its not the only place used, but we see that most calls for attributes even from simplest ls requests are beyond what the StatCache provides, therefore my advice is always to disable maxStatCache by setting it to 0 and raise the maxFilestoCache limit to a higher than default as the memory is better spent there than wasted on StatCache, there is also waste by moving back and forth between StatCache and FileCache if you constantly need more that what the FileCache provides, so raising it and reduce StatCache to zero eliminates this overhead (even its just a few cpu cycles). on LROC its essential as a LROC device can only keep data or Metadata for files it wants to hold any references if it has a StatCache object available, this means if your StatCache is set to 10000 and lets say you have 100000 files you want to cache in LROC this would never work as we throw the oldest out of LROC as soon as we try to cache nr 10001 as we have to reuse a StatCache Object to keep the reference to the data or metadata block stored in LROC . Sven On Wed, Dec 21, 2016 at 12:48 PM Peter Childs wrote: So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses.... Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme < oehmes at gmail.com> Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs > wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Sven Oehme > Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil >> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 40000 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil >> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Flash%20Storage < https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage > Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org< http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Wed Dec 21 16:03:44 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 21 Dec 2016 16:03:44 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down for maintenance In-Reply-To: References: Message-ID: Hi Jan, I am sorry if my post sounded accusatory - I did not mean it that way. We had a very frustrating experience trying to change recoverygroup yesterday morning. I've read the manual you have linked and indeed, you have outlined the correct procedure. I am left wondering why the level 2 gpfs support instructed us not to do that in the future. Their support instructions are contradicting what's in the manual. We are running now with the --active recovery group in place and will change it permanently back to the default setting early in the new year. Anyway, thanks for your help. Damir On Tue, Dec 20, 2016 at 1:36 PM Jan-Frode Myklebust wrote: > I'm sorry for your trouble, but those 4 steps you got from IBM support > does not seem correct. IBM support might not always realize that it's an > ESS, and not plain GPFS... If you take down an ESS IO-node without moving > its RG to the other node using "--servers othernode,thisnode", or by using > --active (which I've never used), you'll take down the whole recoverygroup > and need to suffer an uncontrolled failover. Such an uncontrolled failover > takes a few minutes of filesystem hang, while a controlled failover should > not hang the system. > > I don't see why it's a problem that you now have an IO server that is > owning both recoverygroups. Once your maintenance of the first IO servers > is done, I would just revert the --servers order of that recovergroup, and > it should move back. > > The procedure to move RGs around during IO node maintenance is documented > on page 10 the quick deployment guide (step 1-3): > > > http://www.ibm.com/support/knowledgecenter/en/SSYSP8_4.5.0/c2785801.pdf?view=kc > > > -jf > > > On Tue, Dec 20, 2016 at 6:19 PM, Damir Krstic > wrote: > > For sake of everyone else on this listserv, I'll highlight the appropriate > procedure here. It turns out, changing recovery group on an active system > is not recommended by IBM. We tried following Jan's recommendation this > morning, and the system became unresponsive for about 30 minutes. It only > became responsive (and recovery group change finished) after we killed > couple of processes (ssh and scp) going to couple of clients. > > I got a Sev. 1 with IBM opened and they tell me that appropriate steps for > IO maintenance are as follows: > > 1. change cluster managers to system that will stay up (mmlsmgr - mmchmgr) > 2. unmount gpfs on io node that is going down > 3. shutdown gpfs on io node that is going down > 4. shutdown os > > That's it - recovery groups should not be changed. If there is a need to > change recovery group, use --active option (not permanent change). > > We are now stuck in situation that io2 server is owner of both recovery > groups. The way IBM tells us to fix this is to unmount the filesystem on > all clients and change recovery groups then. We can't do it now and will > have to schedule maintenance sometime in 2017. For now, we have switched > recovery groups using --active flag and things (filesystem performance) > seems to be OK. Load average on both io servers is quite high (250avg) and > does not seem to be going down. > > I really wish that maintenance procedures were documented somewhere on IBM > website. This experience this morning has really shaken my confidence in > ESS. > > Damir > > On Mon, Dec 19, 2016 at 9:53 AM Jan-Frode Myklebust > wrote: > > > Move its recoverygrops to the other node by putting the other node as > primary server for it: > > mmchrecoverygroup rgname --servers otherServer,thisServer > > And verify that it's now active on the other node by "mmlsrecoverygroup > rgname -L". > > Move away any filesystem managers or cluster manager role if that's active > on it. Check with mmlsmgr, move with mmchmgr/mmchmgr -c. > > Then you can run mmshutdown on it (assuming you also have enough quorum > nodes in the remaining cluster). > > > -jf > > man. 19. des. 2016 kl. 15.53 skrev Damir Krstic : > > We have a single ESS GL6 system running GPFS 4.2.0-1. Last night one of > the IO servers phoned home with memory error. IBM is coming out today to > replace the faulty DIMM. > > What is the correct way of taking this system out for maintenance? > > Before ESS we had a large GPFS 3.5 installation with 14 IO servers. When > we needed to do maintenance on the old system, we would migrate manager > role and also move primary and secondary server roles if one of those > systems had to be taken down. > > With ESS and resource pool manager roles etc. is there a correct way of > shutting down one of the IO serves for maintenance? > > Thanks, > Damir > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Dec 21 21:55:51 2016 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 21 Dec 2016 21:55:51 +0000 Subject: [gpfsug-discuss] correct way of taking IO server down formaintenance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 16:44:26 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 10:44:26 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: This is enabled on this node but mmdiag it does not seem to show it caching. Did I miss something? I do have one file system in the cluster that is running 3.5.0.7 wondering if that is causing this. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): 'NULL' status Idle > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Tue Dec 27 11:21:14 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) From aaron.s.knister at nasa.gov Wed Dec 28 17:50:35 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 12:50:35 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Hey Matt, We ran into a similar thing and if I recall correctly a mmchconfig --release=LATEST was required to get LROC working which, of course, would boot your 3.5.0.7 client from the cluster. -Aaron On 12/28/16 11:44 AM, Matt Weil wrote: > This is enabled on this node but mmdiag it does not seem to show it > caching. Did I miss something? I do have one file system in the > cluster that is running 3.5.0.7 wondering if that is causing this. >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): 'NULL' status Idle >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Tue Dec 27 11:21:14 2016 >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Wed Dec 28 18:02:27 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 12:02:27 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> Message-ID: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From oehmes at us.ibm.com Wed Dec 28 19:06:19 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 20:06:19 +0100 Subject: [gpfsug-discuss] LROC In-Reply-To: <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at wustl.edu Wed Dec 28 19:52:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:52:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <02de50ce-2856-061e-3208-1cc496ee80b8@nasa.gov> <42fcd009-040e-e489-3f9d-3a20ff21dd94@wustl.edu> Message-ID: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> k got that fixed now shows as status shutdown > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): > '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' > status Shutdown > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 0 MB, currently in use: 0 MB > Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Wed Dec 28 19:55:18 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 28 Dec 2016 19:55:18 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <8653c4fc-d882-d13f-040c-042118830de3@wustl.edu> Message-ID: Did you restart the daemon on that node after you fixed it ? Sent from IBM Verse Matt Weil --- Re: [gpfsug-discuss] LROC --- From:"Matt Weil" To:gpfsug-discuss at spectrumscale.orgDate:Wed, Dec 28, 2016 8:52 PMSubject:Re: [gpfsug-discuss] LROC k got that fixed now shows as status shutdown [root at ces1 ~]# mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' status Shutdown Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile 1073741824 Max capacity: 0 MB, currently in use: 0 MB Statistics from: Wed Dec 28 13:49:27 2016 On 12/28/16 1:06 PM, Sven Oehme wrote: you have no device configured that's why it doesn't show any stats : >>> LROC Device(s): 'NULL' status Idle run mmsnsd -X to see if gpfs can see the path to the device. most likely it doesn't show up there and you need to adjust your nsddevices list to include it , especially if it is a NVME device. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Matt Weil ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: From: Matt Weil To: Date: 12/28/2016 07:02 PM Subject: Re: [gpfsug-discuss] LROC Sent by: gpfsug-discuss-bounces at spectrumscale.org So I have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > Hey Matt, > > We ran into a similar thing and if I recall correctly a mmchconfig > --release=LATEST was required to get LROC working which, of course, > would boot your 3.5.0.7 client from the cluster. > > -Aaron > > On 12/28/16 11:44 AM, Matt Weil wrote: >> This is enabled on this node but mmdiag it does not seem to show it >> caching. Did I miss something? I do have one file system in the >> cluster that is running 3.5.0.7 wondering if that is causing this. >>> [root at ces1 ~]# mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): 'NULL' status Idle >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >>> 1073741824 >>> Max capacity: 0 MB, currently in use: 0 MB >>> Statistics from: Tue Dec 27 11:21:14 2016 >>> >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) >>> objects failed to store 0 failed to recall 0 failed to inval 0 >>> objects queried 0 (0 MB) not found 0 = 0.00 % >>> objects invalidated 0 (0 MB) >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 19:57:18 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 13:57:18 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: Message-ID: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> no I will do that next. On 12/28/16 1:55 PM, Sven Oehme wrote: > Did you restart the daemon on that node after you fixed it ? Sent from > IBM Verse > > Matt Weil --- Re: [gpfsug-discuss] LROC --- > > From: "Matt Weil" > To: gpfsug-discuss at spectrumscale.org > Date: Wed, Dec 28, 2016 8:52 PM > Subject: Re: [gpfsug-discuss] LROC > > ------------------------------------------------------------------------ > > k got that fixed now shows as status shutdown > >> [root at ces1 ~]# mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): >> '0A6403AA58641546#/dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016;' >> status Shutdown >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 0 MB, currently in use: 0 MB >> Statistics from: Wed Dec 28 13:49:27 2016 > > > > On 12/28/16 1:06 PM, Sven Oehme wrote: > > you have no device configured that's why it doesn't show any stats : > > >>> LROC Device(s): 'NULL' status Idle > > run mmsnsd -X to see if gpfs can see the path to the device. most > likely it doesn't show up there and you need to adjust your nsddevices > list to include it , especially if it is a NVME device. > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > Inactive hide details for Matt Weil ---12/28/2016 07:02:57 PM---So I > have minReleaseLevel 4.1.1.0 Is that to old? On 12/28/16 1Matt Weil > ---12/28/2016 07:02:57 PM---So I have minReleaseLevel 4.1.1.0 Is that > to old? On 12/28/16 11:50 AM, Aaron Knister wrote: > > From: Matt Weil > To: > Date: 12/28/2016 07:02 PM > Subject: Re: [gpfsug-discuss] LROC > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > So I have minReleaseLevel 4.1.1.0 Is that to old? > > > On 12/28/16 11:50 AM, Aaron Knister wrote: > > Hey Matt, > > > > We ran into a similar thing and if I recall correctly a mmchconfig > > --release=LATEST was required to get LROC working which, of course, > > would boot your 3.5.0.7 client from the cluster. > > > > -Aaron > > > > On 12/28/16 11:44 AM, Matt Weil wrote: > >> This is enabled on this node but mmdiag it does not seem to show it > >> caching. Did I miss something? I do have one file system in the > >> cluster that is running 3.5.0.7 wondering if that is causing this. > >>> [root at ces1 ~]# mmdiag --lroc > >>> > >>> === mmdiag: lroc === > >>> LROC Device(s): 'NULL' status Idle > >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > >>> 1073741824 > >>> Max capacity: 0 MB, currently in use: 0 MB > >>> Statistics from: Tue Dec 27 11:21:14 2016 > >>> > >>> Total objects stored 0 (0 MB) recalled 0 (0 MB) > >>> objects failed to store 0 failed to recall 0 failed to inval 0 > >>> objects queried 0 (0 MB) not found 0 = 0.00 % > >>> objects invalidated 0 (0 MB) > >> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 20:15:14 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 14:15:14 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> References: <04c9ca0a-ba7d-a26e-7565-0ab9770df381@wustl.edu> Message-ID: <5127934a-b6b6-c542-f50a-67c47fe6d6db@wustl.edu> still in a 'status Shutdown' even after gpfs was stopped and started. From aaron.s.knister at nasa.gov Wed Dec 28 22:16:00 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:16:00 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Anything interesting in the mmfs log? On a related note I'm curious how a 3.5 client is able to join a cluster with a minreleaselevel of 4.1.1.0. From: Matt Weil Sent: 12/28/16, 3:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC still in a 'status Shutdown' even after gpfs was stopped and started. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:21:21 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:21:21 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E630E0@NDMSMBX404.ndc.nasa.gov> Message-ID: <59fa3ab8-a666-d29c-117d-9db515f566e8@wustl.edu> yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version From aaron.s.knister at nasa.gov Wed Dec 28 22:26:46 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Wed, 28 Dec 2016 22:26:46 +0000 Subject: [gpfsug-discuss] LROC References: [gpfsug-discuss] LROC Message-ID: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Ouch...to quote Adam Savage "well there's yer problem". Are you perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like there was an LROC related assert fixed in 4.1.1.9 but I can't find details on it. From: Matt Weil Sent: 12/28/16, 5:21 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Wed Dec 28 22:39:19 2016 From: mweil at wustl.edu (Matt Weil) Date: Wed, 28 Dec 2016 16:39:19 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> Message-ID: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> > mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.1.2 ". > Built on Oct 27 2016 at 10:52:12 > Running 13 minutes 54 secs, pid 13229 On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > Ouch...to quote Adam Savage "well there's yer problem". Are you > perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like > there was an LROC related assert fixed in 4.1.1.9 but I can't find > details on it. > > > > *From:*Matt Weil > *Sent:* 12/28/16, 5:21 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] LROC > > yes > > > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > > ssdActive) in line 427 of file > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > > logAssertFailed + 0x2D5 at ??:0 > > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > > fs_config_ssds(fs_config*) + 0x867 at ??:0 > > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > > SFSConfigLROC() + 0x189 at ??:0 > > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > > runTSControl(int, int, char**) + 0x80E at ??:0 > > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > > HandleCmdMsg(void*) + 0x1216 at ??:0 > > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > > Thread::callBody(Thread*) + 0x1E2 at ??:0 > > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > > start_thread + 0xC5 at ??:0 > > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > > 0x6D at ??:0 > > mmfsd: > > > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > > failed. > > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > > 0x00007FF15FD71000 > > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > > 0x0000000000000006 > > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > > 0x00007FF15E8D03A8 > > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > > 0x000000000001E9A1 > > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > > 0xFF092D63646B6860 > > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > > 0x0000000000000202 > > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > > 0x00007FF161032EC0 > > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > > 0x0000000000000000 > > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > > 0x0000000000000202 > > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > > 0x0000000000000000 > > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > > 0x0000000010017807 > > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > > at ??:0 > > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > > __assert_fail_base + 126 at ??:0 > > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > > __GI___assert_fail + 42 at ??:0 > > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > > 2F9 at ??:0 > > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > > fs_config_ssds(fs_config*) + 867 at ??:0 > > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > > 189 at ??:0 > > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > > NsdDiskConfig::reReadConfig() + 771 at ??:0 > > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > > int, char**) + 80E at ??:0 > > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > > HandleCmdMsg(void*) + 1216 at ??:0 > > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > > Thread::callBody(Thread*) + 1E2 at ??:0 > > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > > C5 at ??:0 > > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D > at ??:0 > > > On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > > related note I'm curious how a 3.5 client is able to join a cluster > > with a minreleaselevel of 4.1.1.0. > I was referring to the fs version not the gpfs client version sorry for > that confusion > -V 13.23 (3.5.0.7) File system version > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Dec 28 23:19:52 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 28 Dec 2016 18:19:52 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> Message-ID: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Interesting. Would you be willing to post the output of "mmlssnsd -X | grep 0A6403AA58641546" from the troublesome node as suggested by Sven? On 12/28/16 5:39 PM, Matt Weil wrote: > >> mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.1.2 ". >> Built on Oct 27 2016 at 10:52:12 >> Running 13 minutes 54 secs, pid 13229 > > On 12/28/16 4:26 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: >> Ouch...to quote Adam Savage "well there's yer problem". Are you >> perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like >> there was an LROC related assert fixed in 4.1.1.9 but I can't find >> details on it. >> >> >> >> *From:*Matt Weil >> *Sent:* 12/28/16, 5:21 PM >> *To:* gpfsug main discussion list >> *Subject:* Re: [gpfsug-discuss] LROC >> >> yes >> >> > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != >> > ssdActive) in line 427 of file >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C >> > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: >> > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 >> > logAssertFailed + 0x2D5 at ??:0 >> > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 >> > fs_config_ssds(fs_config*) + 0x867 at ??:0 >> > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 >> > SFSConfigLROC() + 0x189 at ??:0 >> > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 >> > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 >> > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E >> > runTSControl(int, int, char**) + 0x80E at ??:0 >> > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 >> > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 >> > HandleCmdMsg(void*) + 0x1216 at ??:0 >> > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 >> > Thread::callBody(Thread*) + 0x1E2 at ??:0 >> > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 >> > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 >> > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 >> > start_thread + 0xC5 at ??:0 >> > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + >> > 0x6D at ??:0 >> > mmfsd: >> > >> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: >> > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, >> > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' >> > failed. >> > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 >> > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. >> > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx >> > 0x00007FF15FD71000 >> > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx >> > 0x0000000000000006 >> > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp >> > 0x00007FF15E8D03A8 >> > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi >> > 0x000000000001E9A1 >> > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 >> > 0xFF092D63646B6860 >> > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 >> > 0x00007FF161032EC0 >> > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags >> > 0x0000000000000202 >> > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err >> > 0x0000000000000000 >> > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk >> > 0x0000000010017807 >> > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 >> > Wed Dec 28 16:17:09.022 2016: [D] Traceback: >> > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 >> > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 >> > at ??:0 >> > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 >> > __assert_fail_base + 126 at ??:0 >> > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 >> > __GI___assert_fail + 42 at ??:0 >> > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + >> > 2F9 at ??:0 >> > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 >> > fs_config_ssds(fs_config*) + 867 at ??:0 >> > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + >> > 189 at ??:0 >> > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB >> > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 >> > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 >> > NsdDiskConfig::reReadConfig() + 771 at ??:0 >> > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, >> > int, char**) + 80E at ??:0 >> > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 >> > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, >> > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 >> > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 >> > HandleCmdMsg(void*) + 1216 at ??:0 >> > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 >> > Thread::callBody(Thread*) + 1E2 at ??:0 >> > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 >> > Thread::callBodyWrapper(Thread*) + A2 at ??:0 >> > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + >> > C5 at ??:0 >> > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D >> at ??:0 >> >> >> On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE >> CORP] wrote: >> > related note I'm curious how a 3.5 client is able to join a cluster >> > with a minreleaselevel of 4.1.1.0. >> I was referring to the fs version not the gpfs client version sorry for >> that confusion >> -V 13.23 (3.5.0.7) File system version >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Thu Dec 29 15:57:40 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 09:57:40 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: > ro_cache_S29GNYAH200016 0A6403AA586531E1 > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > dmm ces1.gsc.wustl.edu server node On 12/28/16 5:19 PM, Aaron Knister wrote: > mmlssnsd -X | grep 0A6403AA58641546 From aaron.s.knister at nasa.gov Thu Dec 29 16:02:44 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 29 Dec 2016 11:02:44 -0500 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> Message-ID: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. That's a *really* long device path (and nested too), I wonder if that's causing issues. What does a "tspreparedisk -S" show on that node? Also, what does your nsddevices script look like? I'm wondering if you could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" paths if that would help things here. -Aaron On 12/29/16 10:57 AM, Matt Weil wrote: > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> dmm ces1.gsc.wustl.edu server node > > > On 12/28/16 5:19 PM, Aaron Knister wrote: >> mmlssnsd -X | grep 0A6403AA58641546 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Thu Dec 29 16:09:58 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 16:09:58 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: i agree that is a very long name , given this is a nvme device it should show up as /dev/nvmeXYZ i suggest to report exactly that in nsddevices and retry. i vaguely remember we have some fixed length device name limitation , but i don't remember what the length is, so this would be my first guess too that the long name is causing trouble. On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:10:24 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:10:24 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:02 AM, Aaron Knister wrote: > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if > that's causing issues. was thinking of trying just /dev/sdxx > > What does a "tspreparedisk -S" show on that node? tspreparedisk:0::::0:0:: > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" paths if that would help things here. > if [[ $osName = Linux ]] > then > : # Add function to discover disks in the Linux environment. > for luns in `ls /dev/disk/by-id | grep nvme` > do > all_luns=disk/by-id/$luns > echo $all_luns dmm > done > > fi > I will try that. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: >> >> >>> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >>> dmm ces1.gsc.wustl.edu server node >> >> >> On 12/28/16 5:19 PM, Aaron Knister wrote: >>> mmlssnsd -X | grep 0A6403AA58641546 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > From mweil at wustl.edu Thu Dec 29 16:18:30 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:18:30 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: On 12/29/16 10:09 AM, Sven Oehme wrote: > i agree that is a very long name , given this is a nvme device it > should show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , > but i don't remember what the length is, so this would be my first > guess too that the long name is causing trouble. I will try that. I was attempting to not need to write a custom udev rule for those. Also to keep the names persistent. Rhel 7 has a default rule that makes a sym link in /dev/disk/by-id. 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> ../../nvme0n1 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> ../../nvme1n1 > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws > here. > > That's a *really* long device path (and nested too), I wonder if > that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of > "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu > server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:28:32 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:28:32 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> Message-ID: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> wow that was it. > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 It is not caching however. I will restart gpfs to see if that makes it start working. On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: >> i agree that is a very long name , given this is a nvme device it >> should show up as /dev/nvmeXYZ >> i suggest to report exactly that in nsddevices and retry. >> i vaguely remember we have some fixed length device name limitation , >> but i don't remember what the length is, so this would be my first >> guess too that the long name is causing trouble. > I will try that. I was attempting to not need to write a custom udev > rule for those. Also to keep the names persistent. Rhel 7 has a > default rule that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 >> >> >> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >> > wrote: >> >> Interesting. Thanks Matt. I admit I'm somewhat grasping at straws >> here. >> >> That's a *really* long device path (and nested too), I wonder if >> that's >> causing issues. >> >> What does a "tspreparedisk -S" show on that node? >> >> Also, what does your nsddevices script look like? I'm wondering >> if you >> could have it give back "/dev/dm-XXX" paths instead of >> "/dev/disk/by-id" >> paths if that would help things here. >> >> -Aaron >> >> On 12/29/16 10:57 AM, Matt Weil wrote: >> > >> > >> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >> >> >> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >> >> dmm ces1.gsc.wustl.edu >> server node >> > >> > >> > On 12/28/16 5:19 PM, Aaron Knister wrote: >> >> mmlssnsd -X | grep 0A6403AA58641546 >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at spectrumscale.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 16:41:38 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 10:41:38 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> Message-ID: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> after restart. still doesn't seem to be in use. > [root at ces1 ~]# mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:35:32 2016 > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > >> mmdiag --lroc >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:08:58 2016 > It is not caching however. I will restart gpfs to see if that makes > it start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: >> >> >> >> On 12/29/16 10:09 AM, Sven Oehme wrote: >>> i agree that is a very long name , given this is a nvme device it >>> should show up as /dev/nvmeXYZ >>> i suggest to report exactly that in nsddevices and retry. >>> i vaguely remember we have some fixed length device name limitation >>> , but i don't remember what the length is, so this would be my first >>> guess too that the long name is causing trouble. >> I will try that. I was attempting to not need to write a custom udev >> rule for those. Also to keep the names persistent. Rhel 7 has a >> default rule that makes a sym link in /dev/disk/by-id. >> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> >> ../../nvme0n1 >> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> >> ../../nvme1n1 >>> >>> >>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>> > wrote: >>> >>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>> straws here. >>> >>> That's a *really* long device path (and nested too), I wonder if >>> that's >>> causing issues. >>> >>> What does a "tspreparedisk -S" show on that node? >>> >>> Also, what does your nsddevices script look like? I'm wondering >>> if you >>> could have it give back "/dev/dm-XXX" paths instead of >>> "/dev/disk/by-id" >>> paths if that would help things here. >>> >>> -Aaron >>> >>> On 12/29/16 10:57 AM, Matt Weil wrote: >>> > >>> > >>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>> >> >>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> >> dmm ces1.gsc.wustl.edu >>> server node >>> > >>> > >>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>> >> mmlssnsd -X | grep 0A6403AA58641546 >>> > >>> > _______________________________________________ >>> > gpfsug-discuss mailing list >>> > gpfsug-discuss at spectrumscale.org >>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> > >>> >>> -- >>> Aaron Knister >>> NASA Center for Climate Simulation (Code 606.2) >>> Goddard Space Flight Center >>> (301) 286-2776 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Dec 29 17:06:40 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 29 Dec 2016 17:06:40 +0000 Subject: [gpfsug-discuss] LROC In-Reply-To: <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: first good that the problem at least is solved, it would be great if you could open a PMR so this gets properly fixed, the daemon shouldn't segfault, but rather print a message that the device is too big. on the caching , it only gets used when you run out of pagepool or when you run out of full file objects . so what benchmark, test did you run to push data into LROC ? sven On Thu, Dec 29, 2016 at 5:41 PM Matt Weil wrote: > after restart. still doesn't seem to be in use. > > [root at ces1 ~]# mmdiag --lroc > > > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > > Statistics from: Thu Dec 29 10:35:32 2016 > > > > Total objects stored 0 (0 MB) recalled 0 (0 MB) > objects failed to store 0 failed to recall 0 failed to inval 0 > objects queried 0 (0 MB) not found 0 = 0.00 % > objects invalidated 0 (0 MB) > > > On 12/29/16 10:28 AM, Matt Weil wrote: > > wow that was it. > > mmdiag --lroc > > === mmdiag: lroc === > LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running > Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile > 1073741824 > Max capacity: 1526184 MB, currently in use: 0 MB > Statistics from: Thu Dec 29 10:08:58 2016 > > It is not caching however. I will restart gpfs to see if that makes it > start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: > > i agree that is a very long name , given this is a nvme device it should > show up as /dev/nvmeXYZ > i suggest to report exactly that in nsddevices and retry. > i vaguely remember we have some fixed length device name limitation , but > i don't remember what the length is, so this would be my first guess too > that the long name is causing trouble. > > I will try that. I was attempting to not need to write a custom udev rule > for those. Also to keep the names persistent. Rhel 7 has a default rule > that makes a sym link in /dev/disk/by-id. > 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 -> > ../../nvme0n1 > 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 > nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 -> > ../../nvme1n1 > > > > On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister > wrote: > > Interesting. Thanks Matt. I admit I'm somewhat grasping at straws here. > > That's a *really* long device path (and nested too), I wonder if that's > causing issues. > > What does a "tspreparedisk -S" show on that node? > > Also, what does your nsddevices script look like? I'm wondering if you > could have it give back "/dev/dm-XXX" paths instead of "/dev/disk/by-id" > paths if that would help things here. > > -Aaron > > On 12/29/16 10:57 AM, Matt Weil wrote: > > > > > >> ro_cache_S29GNYAH200016 0A6403AA586531E1 > >> > /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 > >> dmm ces1.gsc.wustl.edu server node > > > > > > On 12/28/16 5:19 PM, Aaron Knister wrote: > >> mmlssnsd -X | grep 0A6403AA58641546 > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 <%28301%29%20286-2776> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Thu Dec 29 17:23:11 2016 From: mweil at wustl.edu (Matt Weil) Date: Thu, 29 Dec 2016 11:23:11 -0600 Subject: [gpfsug-discuss] LROC In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101E631A4@NDMSMBX404.ndc.nasa.gov> <28805461-9b8c-f470-5afe-672b2c0ed9e3@wustl.edu> <4a818373-5af5-cf37-8dd8-d29d4583bec2@nasa.gov> <0730d0df-9f56-6206-c8cb-2b6342ba3c9f@nasa.gov> <5c036486-63e7-5a01-846f-83f8b30a9b8d@wustl.edu> <5aed7393-3eca-931b-6b4d-87e37394f36e@wustl.edu> Message-ID: <45b19a50-bb70-1025-71ea-80a260623712@wustl.edu> -k thanks all I see it using the lroc now. On 12/29/16 11:06 AM, Sven Oehme wrote: > first good that the problem at least is solved, it would be great if > you could open a PMR so this gets properly fixed, the daemon shouldn't > segfault, but rather print a message that the device is too big. > > on the caching , it only gets used when you run out of pagepool or > when you run out of full file objects . so what benchmark, test did > you run to push data into LROC ? > > sven > > > On Thu, Dec 29, 2016 at 5:41 PM Matt Weil > wrote: > > after restart. still doesn't seem to be in use. > >> [root at ces1 ~]# mmdiag --lroc >> >> >> === mmdiag: lroc === >> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 stubFile >> 1073741824 >> Max capacity: 1526184 MB, currently in use: 0 MB >> Statistics from: Thu Dec 29 10:35:32 2016 >> >> >> Total objects stored 0 (0 MB) recalled 0 (0 MB) >> objects failed to store 0 failed to recall 0 failed to inval 0 >> objects queried 0 (0 MB) not found 0 = 0.00 % >> objects invalidated 0 (0 MB) > > On 12/29/16 10:28 AM, Matt Weil wrote: >> >> wow that was it. >> >>> mmdiag --lroc >>> >>> === mmdiag: lroc === >>> LROC Device(s): '0A6403AA5865389E#/dev/nvme0n1;' status Running >>> Cache inodes 1 dirs 1 data 1 Config: maxFile 1073741824 >>> stubFile 1073741824 >>> Max capacity: 1526184 MB, currently in use: 0 MB >>> Statistics from: Thu Dec 29 10:08:58 2016 >> It is not caching however. I will restart gpfs to see if that >> makes it start working. >> >> On 12/29/16 10:18 AM, Matt Weil wrote: >>> >>> >>> >>> On 12/29/16 10:09 AM, Sven Oehme wrote: >>>> i agree that is a very long name , given this is a nvme device >>>> it should show up as /dev/nvmeXYZ >>>> i suggest to report exactly that in nsddevices and retry. >>>> i vaguely remember we have some fixed length device name >>>> limitation , but i don't remember what the length is, so this >>>> would be my first guess too that the long name is causing trouble. >>> I will try that. I was attempting to not need to write a custom >>> udev rule for those. Also to keep the names persistent. Rhel 7 >>> has a default rule that makes a sym link in /dev/disk/by-id. >>> 0 lrwxrwxrwx 1 root root 13 Dec 29 10:08 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>> -> ../../nvme0n1 >>> 0 lrwxrwxrwx 1 root root 13 Dec 27 11:20 >>> nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH300161 >>> -> ../../nvme1n1 >>>> >>>> >>>> On Thu, Dec 29, 2016 at 5:02 PM Aaron Knister >>>> > wrote: >>>> >>>> Interesting. Thanks Matt. I admit I'm somewhat grasping at >>>> straws here. >>>> >>>> That's a *really* long device path (and nested too), I >>>> wonder if that's >>>> causing issues. >>>> >>>> What does a "tspreparedisk -S" show on that node? >>>> >>>> Also, what does your nsddevices script look like? I'm >>>> wondering if you >>>> could have it give back "/dev/dm-XXX" paths instead of >>>> "/dev/disk/by-id" >>>> paths if that would help things here. >>>> >>>> -Aaron >>>> >>>> On 12/29/16 10:57 AM, Matt Weil wrote: >>>> > >>>> > >>>> >> ro_cache_S29GNYAH200016 0A6403AA586531E1 >>>> >> >>>> /dev/disk/by-id/nvme-Dell_Express_Flash_NVMe_SM1715_1.6TB_SFF_______S29GNYAH200016 >>>> >> dmm ces1.gsc.wustl.edu >>>> server node >>>> > >>>> > >>>> > On 12/28/16 5:19 PM, Aaron Knister wrote: >>>> >> mmlssnsd -X | grep 0A6403AA58641546 >>>> > >>>> > _______________________________________________ >>>> > gpfsug-discuss mailing list >>>> > gpfsug-discuss at spectrumscale.org >>>> >>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> > >>>> >>>> -- >>>> Aaron Knister >>>> NASA Center for Climate Simulation (Code 606.2) >>>> Goddard Space Flight Center >>>> (301) 286-2776 >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Sat Dec 31 20:05:35 2016 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Sat, 31 Dec 2016 15:05:35 -0500 Subject: [gpfsug-discuss] Save the Date April 4-5 2017 Users Group Meeting at NERSC Message-ID: Hello all and happy new year (depending upon where you are right now :-) ). We'll have more details in 2017, but for now please save the date for a two-day users group meeting at NERSC in Berkeley, California. April 4-5, 2017 National Energy Research Scientific Computing Center (nersc.gov) Berkeley, California We look forward to offering our first two-day event in the US. Best, Kristy & Bob