From Luke.Raimbach at crick.ac.uk Tue Mar 1 12:43:54 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 1 Mar 2016 12:43:54 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS Message-ID: HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 16:22:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 16:22:35 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Mar 2 16:27:24 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:27:24 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From S.J.Thompson at bham.ac.uk Wed Mar 2 16:29:34 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:29:34 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Message-ID: I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon From oehmes at us.ibm.com Wed Mar 2 16:31:12 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 2 Mar 2016 08:31:12 -0800 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: <201603021631.u22GVTh9003605@d03av04.boulder.ibm.com> its direct SAS attached . ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 03/02/2016 08:27 AM Subject: Re: [gpfsug-discuss] IBM-Sandisk Announcement Sent by: gpfsug-discuss-bounces at spectrumscale.org There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 17:04:57 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 17:04:57 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: <37CDF3CF-53AD-45FC-8E0C-582CED5DD99F@nuance.com> The reason I?m asking is that I?m doing a test with an IF100 box, and wanted to know what the IBM plans were for it :-) Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVqeu026048@d03av04.boulder.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVuTl015056@d01av01.pok.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 2 19:46:48 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:46:48 -0600 Subject: [gpfsug-discuss] cpu shielding Message-ID: <56D74328.50507@genome.wustl.edu> All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From bbanister at jumptrading.com Wed Mar 2 19:49:50 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 19:49:50 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D74328.50507@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil Sent: Wednesday, March 02, 2016 1:47 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cpu shielding All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 19:54:21 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:54:21 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D744ED.30307@genome.wustl.edu> Can you share anything more? We are trying all system related items on cpu0 GPFS is on cpu1 and the rest are used for the lsf scheduler. With that setup we still see evictions. Thanks Matt On 3/2/16 1:49 PM, Bryan Banister wrote: > We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. > -B > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil > Sent: Wednesday, March 02, 2016 1:47 PM > To: gpfsug main discussion list > Subject: [gpfsug-discuss] cpu shielding > > All, > > We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? > > Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. > > Thanks > > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From viccornell at gmail.com Wed Mar 2 20:15:16 2016 From: viccornell at gmail.com (viccornell at gmail.com) Date: Wed, 2 Mar 2016 21:15:16 +0100 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D744ED.30307@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Wed Mar 2 20:17:38 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 20:17:38 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 20:22:05 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:22:05 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <56D74B6D.8050802@genome.wustl.edu> On 3/2/16 2:15 PM, viccornell at gmail.com wrote: > Hi, > > How sure are you that it is cpu scheduling that is your problem? just spotted this maybe it can help spot something. https://software.intel.com/en-us/articles/intel-performance-counter-monitor > > Are you using IB or Ethernet? two 10 gig Intel nics in a LACP bond. links are not saturated. > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From S.J.Thompson at bham.ac.uk Wed Mar 2 20:24:44 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 20:24:44 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> , <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] Sent: 02 March 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at genome.wustl.edu Wed Mar 2 20:47:24 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:47:24 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D7515C.4070102@genome.wustl.edu> GPFS client version 3.5.0-15 any related issues there with timeouts? On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Greg.Lehmann at csiro.au Wed Mar 2 22:48:51 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 2 Mar 2016 22:48:51 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <304dd806ce6e4488b163676bb5889da2@exch2-mel.nexus.csiro.au> Sitting next to 2 DDN guys doing some gridscaler training. Their opinion is "pure FUD". They are happy for us to run IBM or their Spectrum Scale packages in the DDN hardware. Cheers, Greg -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Thursday, 3 March 2016 2:30 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS vs Spectrum Scale I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From daniel.kidger at uk.ibm.com Wed Mar 2 22:52:55 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 2 Mar 2016 22:52:55 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <201603022153.u22Lr0nY015961@d06av10.portsmouth.uk.ibm.com> I work for IBM and in particular support OEMs and other Business Partners I am not sure if Simon is using try true IBM speak here as any OEM purchase of Spectrum Scale inherently has tin included, be it from DDN, Seagate, Lenovo, etc. Remember there are 4 main ways to buy Spectrum Scale: 1. as pure software, direct from IBM or though a business partner. 2. as part of a hardware offering from an OEM 3. as part of a hardware offering from IBM. This is what ESS is. 4. as a cloud service in Softlayer. Spectrum Scale (GPFS) is exactly the same software no matter which route above is used to purchase it. What OEMs do do, as IBM do with their ESS appliance product is do extra validation to confirm that the newest release is fully compatible with their hardware solution and has no regressions in performance or otherwise. Hence there is often perhaps 3 months between say the 4.2 official release and when it appears in OEM solutions. ESS is the same here. The two difference to note that make #2 OEM systems different are though are: 1: When bought as part of an OEM through say Lenovo, DDN or Seagate then that OEM owns the actual GFPS licenses rather than the end customer. The practical side of this is that if you later replace the hardware with a different vendors hardware there is no automatic right to transfer over the old licenses, as would be the case if GPFS was bought directly from IBM/ 2. When bought as part of an OEM system, then that OEM is the sole point of contact for the customer for all support. The customer does not first have to triage if it is a hw or sw issue. The OEM in return provides 1st and 2nd line support to the customer, and only escalates in-depth level 3 support issues to IBM's development team. The OEMs then will have gone though extensive training to be able to do such 1st and 2nd line support. (Of course many traditional IBM Business Partners are also very clued up about helping their customers directly.) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 02/03/2016 16:30 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Sent by: gpfsug-discuss-bounces at spectrumscale.org I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZNwQ032425@d03av04.boulder.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZMCU018632@d01av01.pok.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Thu Mar 3 09:07:25 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 3 Mar 2016 09:07:25 +0000 Subject: [gpfsug-discuss] Cloning across fileset boundaries Message-ID: Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDobP010703@d03av04.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDplG002884@d03av01.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Mar 3 21:57:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 3 Mar 2016 21:57:20 +0000 Subject: [gpfsug-discuss] Small cluster Message-ID: I have a client that wants to build small remote sites to sync back to an ESS cluster they purchased. These remote sites are generally <15-20TB. If I build a three node cluster with just internal drives can this work if the drives aren?t shared amongst the cluster without FPO or GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with the three nodes? Assuming all are NSD servers (or two at least). Seems like most of the implementations I?m seeing use shared disks so local drives only would be an odd architecture right? What do I give up by not having shared disks seen by other NSD servers? Mark Bush Storage Architect This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Mar 3 22:23:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 3 Mar 2016 22:23:08 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: Message-ID: <56D8B94C.2000303@buzzard.me.uk> On 03/03/16 21:57, Mark.Bush at siriuscom.com wrote: > I have a client that wants to build small remote sites to sync back to > an ESS cluster they purchased. These remote sites are generally > <15-20TB. If I build a three node cluster with just internal drives can > this work if the drives aren?t shared amongst the cluster without FPO or > GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with > the three nodes? Assuming all are NSD servers (or two at least). Seems > like most of the implementations I?m seeing use shared disks so local > drives only would be an odd architecture right? What do I give up by > not having shared disks seen by other NSD servers? > Unless you are doing data and metadata replication on the remote sites then any one server going down is not good at all. To be honest I have only ever seen that sort of setup done once. It was part of a high availability web server system. The idea was GPFS provided the shared storage between the nodes by replicating everything. Suffice as to say keeping things polite "don't do that". In reality the swear words coming from the admin trying to get GPFS fixed when disks failed where a lot more colourful. In the end the system was abandoned and migrated to ESX as it was back then. Mind you that was in the days of GPFS 2.3 so it *might* be better now; are you feeling lucky? However a SAS attached Dell MD3 (it's LSI/Netgear Engenio storage so basically the same as a DS3000/4000/5000) is frankly so cheap that it's just not worth going down that route if you ask me. I would do a two server cluster with a tie breaker disk on the MD3 to avoid any split brain issues, and use the saving on the third server to buy the MD3 and SAS cards. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From makaplan at us.ibm.com Fri Mar 4 16:09:03 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 4 Mar 2016 11:09:03 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <56D8B94C.2000303@buzzard.me.uk> References: <56D8B94C.2000303@buzzard.me.uk> Message-ID: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri Mar 4 16:21:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:21:20 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:26:15 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:26:15 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > I guess this is really my question. Budget is less than $50k per site and > they need around 20TB storage. Two nodes with MD3 or something may work. > But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > From: on behalf of Marc A > Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution more > difficult. > > To begin with, as with any provisioning problem, one should try to state: > requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 4 16:28:52 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 04 Mar 2016 16:28:52 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <1457108932.4251.183.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-04 at 11:09 -0500, Marc A Kaplan wrote: > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the > GPFS 2.3 era. Once bitten twice shy. I was offering my experience of that setup, which is not good. I my defense I did note it was it the 2.x era and it might be better now. > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. The other thing I would point out is that replacing a disk in a MD3 or similar is an operator level procedure. Replacing a similar disk up the front with GPFS replication requires a skilled GPFS administrator. Given these are to be on remote sites, I would suspect simpler lower skilled maintenance is better. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Fri Mar 4 16:30:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:30:41 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:36:30 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:36:30 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > Yes. Really the only other option we have (and not a bad one) is getting > a v7000 Unified in there (if we can get the price down far enough). That?s > not a bad option since all they really want is SMB shares in the remote. I > just keep thinking a set of servers would do the trick and be cheaper. > > > > From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM > > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks below > the GPFS filesystem layer and looks like a normal GPFS system (mostly). I > do think there were some restrictions on non-FPO nodes mounting FPO > filesystems via multi-cluster.. not sure if those are still there.. any > input on that from IBM? > > If small enough data, and with 3-way replication, it might just be wise to > do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common > throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between 'science > experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < > Mark.Bush at siriuscom.com> wrote: > >> I guess this is really my question. Budget is less than $50k per site >> and they need around 20TB storage. Two nodes with MD3 or something may >> work. But could it work (and be successful) with just servers and internal >> drives? Should I do FPO for non hadoop like workloads? I didn?t think I >> could get native raid except in the ESS (GSS no longer exists if I remember >> correctly). Do I just make replicas and call it good? >> >> >> Mark >> >> From: on behalf of Marc A >> Kaplan >> Reply-To: gpfsug main discussion list >> Date: Friday, March 4, 2016 at 10:09 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Small cluster >> >> Jon, I don't doubt your experience, but it's not quite fair or even >> sensible to make a decision today based on what was available in the GPFS >> 2.3 era. >> >> We are now at GPFS 4.2 with support for 3 way replication and FPO. >> Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS >> solutions and more. >> >> So more choices, more options, making finding an "optimal" solution more >> difficult. >> >> To begin with, as with any provisioning problem, one should try to state: >> requirements, goals, budgets, constraints, failure/tolerance >> models/assumptions, >> expected workloads, desired performance, etc, etc. >> >> >> This message (including any attachments) is intended only for the use of >> the individual or entity to which it is addressed and may contain >> information that is non-public, proprietary, privileged, confidential, and >> exempt from disclosure under applicable law. If you are not the intended >> recipient, you are hereby notified that any use, dissemination, >> distribution, or copying of this communication is strictly prohibited. This >> message may be viewed by parties at Sirius Computer Solutions other than >> those named in the message header. This message does not contain an >> official representation of Sirius Computer Solutions. If you have received >> this communication in error, notify Sirius Computer Solutions immediately >> and (i) destroy this message if a facsimile or (ii) delete this message >> immediately if this is an electronic communication. Thank you. >> Sirius Computer Solutions >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > Zach Giles > zgiles at gmail.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 4 16:40:54 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 4 Mar 2016 10:40:54 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D9BA96.8010800@genome.wustl.edu> All, This turned out to be processes copying data from GPFS to local /tmp. Once the system memory was full it started blocking while the data was being flushed to disk. This process was taking long enough to have leases expire. Matt On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Paul.Sanchez at deshaw.com Fri Mar 4 16:54:39 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 4 Mar 2016 16:54:39 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: You wouldn?t be alone in trying to make the ?concurrent CES gateway + NSD server nodes? formula work. That doesn?t mean it will be well-supported initially, but it does mean that others will be finding bugs and interaction issues along with you. On GPFS 4.1.1.2 for example, it?s possible to get a CES protocol node into a state where the mmcesmonitor is dead and requires a mmshutdown/mmstartup to recover from. Since in a shared-nothing disk topology that would require mmchdisk/mmrestripefs to recover and rebalance, it would be operationally intensive to run CES on an NSD server with local disks. With shared SAN disks, this becomes more tractable, in my opinion. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Zachary Giles Sent: Friday, March 04, 2016 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com > wrote: Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Mar 4 18:03:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 4 Mar 2016 19:03:16 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From ceason at us.ibm.com Fri Mar 4 18:20:50 2016 From: ceason at us.ibm.com (Jeffrey M Ceason) Date: Fri, 4 Mar 2016 11:20:50 -0700 Subject: [gpfsug-discuss] Small cluster (Jeff Ceason) In-Reply-To: References: Message-ID: <201603041821.u24IL6S6000328@d01av02.pok.ibm.com> The V7000 Unified type system is made for this application. http://www-03.ibm.com/systems/storage/disk/storwize_v7000/ Jeff Ceason Solutions Architect (520) 268-2193 (Mobile) ceason at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/04/2016 11:15 AM Subject: gpfsug-discuss Digest, Vol 50, Issue 14 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Small cluster (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 4 Mar 2016 19:03:16 +0100 From: "Sven Oehme" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Message-ID: <201603041804.u24I4g2R026689 at d03av01.boulder.ibm.com> Content-Type: text/plain; charset="utf-8" Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 14 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From janfrode at tanso.net Sat Mar 5 13:16:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 05 Mar 2016 13:16:54 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : > Hi, > > a couple of comments to the various infos in this thread. > > 1. the need to run CES on separate nodes is a recommendation, not a > requirement and the recommendation comes from the fact that if you have > heavy loaded NAS traffic that gets the system to its knees, you can take > your NSD service down with you if its on the same box. so as long as you > have a reasonable performance expectation and size the system correct there > is no issue. > > 2. shared vs FPO vs shared nothing (just replication) . the main issue > people overlook in this scenario is the absence of read/write caches in FPO > or shared nothing configurations. every physical disk drive can only do > ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its > pretty much the same effort. particular on metadata this bites you really > badly as every of this tiny i/os eats one of your 100 iops a disk can do > and quickly you used up all your iops on the drives. if you have any form > of raid controller (sw or hw) it typically implements at minimum a read > cache on most systems a read/write cache which will significant increase > the number of logical i/os one can do against a disk , my best example is > always if you have a workload that does 4k seq DIO writes to a single disk, > if you have no raid controller you can do 400k/sec in this workload if you > have a reasonable ok write cache in front of the cache you can do 50 times > that much. so especilly if you use snapshots, CES services or anything > thats metadata intensive you want some type of raid protection with > caching. btw. replication in the FS makes this even worse as now each write > turns into 3 iops for the data + additional iops for the log records so you > eat up your iops very quick . > > 3. instead of shared SAN a shared SAS device is significantly cheaper but > only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 > nodes as you can use the disks as tiebreaker disks. if you also add some > SSD's for the metadata and make use of HAWC and LROC you might get away > from not needing a raid controller with cache as HAWC will solve that issue > for you . > > just a few thoughts :-D > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > [image: Inactive hide details for Zachary Giles ---03/04/2016 05:36:50 > PM---SMB too, eh? See this is where it starts to get hard to sca]Zachary > Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts > to get hard to scale down. You could do a 3 node GPFS clust > > From: Zachary Giles > > > To: gpfsug main discussion list > > Date: 03/04/2016 05:36 PM > > > Subject: Re: [gpfsug-discuss] Small cluster > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > SMB too, eh? See this is where it starts to get hard to scale down. You > could do a 3 node GPFS cluster with replication at remote sites, pulling in > from AFM over the Net. If you want SMB too, you're probably going to need > another pair of servers to act as the Protocol Servers on top of the 3 GPFS > servers. I think running them all together is not recommended, and probably > I'd agree with that. > Though, you could do it anyway. If it's for read-only and updated daily, > eh, who cares. Again, depends on your GPFS experience and the balance > between production, price, and performance :) > > On Fri, Mar 4, 2016 at 11:30 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > > Yes. Really the only other option we have (and not a bad one) is > getting a v7000 Unified in there (if we can get the price down far > enough). That?s not a bad option since all they really want is SMB shares > in the remote. I just keep thinking a set of servers would do the trick > and be cheaper. > > > > *From: *Zachary Giles <*zgiles at gmail.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:26 AM > > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks > below the GPFS filesystem layer and looks like a normal GPFS system > (mostly). I do think there were some restrictions on non-FPO nodes > mounting FPO filesystems via multi-cluster.. not sure if those are still > there.. any input on that from IBM? > > If small enough data, and with 3-way replication, it might just be > wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just > common throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between > 'science experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > I guess this is really my question. Budget is less than $50k per site > and they need around 20TB storage. Two nodes with MD3 or something may > work. But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > *From: *<*gpfsug-discuss-bounces at spectrumscale.org* > > on behalf of Marc A Kaplan > <*makaplan at us.ibm.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:09 AM > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. > > To begin with, as with any provisioning problem, one should try to > state: requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > > *Sirius Computer Solutions* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Sat Mar 5 13:31:40 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sat, 5 Mar 2016 14:31:40 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> as i stated in my previous post , its a recommendation so people don't overload the NSD servers to have them become non responsive or even forced rebooted (e.g. when you configure cNFS auto reboot on same node), it doesn't mean it doesn't work or is not supported. if all you are using this cluster for is NAS services, then this recommendation makes even less sense as the whole purpose on why the recommendation is there to begin with is that if NFS would overload a node that also serves as NSD server for other nodes it would impact the other nodes that use the NSD protocol, but if there are no NSD clients there is nothing to protect because if NFS is down all clients are not able to access data, even if your NSD servers are perfectly healthy... if you have a fairly large system with many NSD Servers, many clients as well as NAS clients this recommendation is correct, but not in the scenario you described below. i will work with the team to come up with a better wording for this in the FAQ. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jan-Frode Myklebust To: gpfsug main discussion list Cc: Sven Oehme/Almaden/IBM at IBMUS Date: 03/05/2016 02:17 PM Subject: Re: [gpfsug-discuss] Small cluster Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Zachary Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS clust From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough).? That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly).? I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Sven Oehme/Almaden/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sat Mar 5 18:40:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 5 Mar 2016 13:40:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> Message-ID: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Sat Mar 5 18:52:16 2016 From: zgiles at gmail.com (Zachary Giles) Date: Sat, 5 Mar 2016 13:52:16 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: > Indeed it seems to just add overhead and expense to split what can be done > by one node over two nodes! > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Sun Mar 6 13:55:59 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sun, 6 Mar 2016 14:55:59 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603051331.u25DVjvV017738@d01av01.pok.ibm.com><201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: <201603061356.u26Du4Zj014555@d03av05.boulder.ibm.com> the question is what difference does it make ? as i mentioned if all your 2 or 3 nodes do is serving NFS it doesn't matter if the protocol nodes or the NSD services are down in both cases it means no access to data which it makes no sense to separate them in this case (unless load dependent). i haven't seen nodes reboot specifically because of protocol issues lately, the fact that everything is in userspace makes things easier too. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/06/2016 02:31 AM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sun Mar 6 20:27:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 6 Mar 2016 15:27:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From janfrode at tanso.net Mon Mar 7 06:11:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 07 Mar 2016 06:11:27 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you really > need the protection/isolation. But I expect you are limiting the potential > performance of the overall system, compared to running a single Unix image > with multiple processes that can share resource and communicate more freely. > > > [image: Marc A Kaplan] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From volobuev at us.ibm.com Mon Mar 7 20:58:37 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Mon, 7 Mar 2016 12:58:37 -0800 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri From: Jan-Frode Myklebust To: gpfsug main discussion list , Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : As Sven wrote, the FAQ does not "prevent" anything.? It's just a recommendation someone came up with.? Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation.? But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Mar 7 21:10:48 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Mon, 7 Mar 2016 21:10:48 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this conversation. Thank you all for your responses. This is a great forum filled with very knowledgeable folks. Mark From: > on behalf of Yuri L Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri [Inactive hide details for Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within whatever is recommended. From: Jan-Frode Myklebust > To: gpfsug main discussion list >, Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan >: As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@][cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@]_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: 0B132319.gif URL: From r.sobey at imperial.ac.uk Tue Mar 8 09:48:01 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 09:48:01 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Message-ID: Morning all, I tried to download the VM to evaluate SS yesterday - more of a chance to play around with commands in a non-prod environment and look at what's in store. We're currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who's already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Mar 8 13:09:21 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 8 Mar 2016 13:09:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: References: Message-ID: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Mar 8 13:16:37 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 13:16:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> References: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Message-ID: Hah, well now you?ve got me checking just to make sure ? Ok, definitely says United Kingdom. Now it won?t let me download it at all, says page not found. Will persevere! Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 08 March 2016 13:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Eval VM download query Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel ________________________________ Dr.Daniel Kidger No. 1 The Square, [cid:image001.gif at 01D1793C.BE2DB440] Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com ________________________________ From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 360 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Tue Mar 8 15:53:34 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 8 Mar 2016 15:53:34 +0000 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Message-ID: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 16:36:46 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:36:46 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <20160308113646.54314ikzhtedrjby@support.scinet.utoronto.ca> I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Tue Mar 8 16:54:45 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:54:45 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <20160308115445.10061uekt4pp5kgl@support.scinet.utoronto.ca> For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Tue Mar 8 17:45:42 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:45:42 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Hi, please have a look at this document: http://www-01.ibm.com/support/docview.wss?uid=swg27018848 It describe the how-to setup and provides some hints and tips for migration policies. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Tue Mar 8 17:46:11 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:46:11 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <201603081646.u28GkIXt026930@d06av10.portsmouth.uk.ibm.com> Hi, in all cases a recall request will be handled transparent for the user at the time a migrated files is accessed. This can't be prevented and has two down sides: a) the space used in the file system increases and b) random access to storage media in the Spectrum Protect server happens. With newer versions of Spectrum Protect for Space Management a so called tape optimized recall method is available that can reduce the impact to the system (especially Spectrum Protect server). If the problem was that the file system went out of space at the time the recalls came in I would recommend to reduce the threshold settings for the file system and increase the number of premigrated files. This will allow to free space very quickly if needed. If you didn't use the policy based threshold migration so far I recommend to use it. This method is significant faster compared to the classical HSM based threshold migration approach. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 ----- From: Jaime Pinto To: gpfsug main discussion list Date: 08.03.2016 17:36 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Sent by: gpfsug-discuss-bounces at spectrumscale.org I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Tue Mar 8 18:58:29 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 18:58:29 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: My fantasy solution is 2 servers and a SAS disk shelf from my adopted, cheap x86 vendor running IBM Spectrum Scale with GNR as software only, doing concurrent, supported GNR and CES with maybe an advisory on the performance requirements of such and suggestions on scale out approaches :) Cheers Chris On 7 March 2016 at 21:10, Mark.Bush at siriuscom.com wrote: > Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this > conversation. Thank you all for your responses. This is a great forum > filled with very knowledgeable folks. > > Mark > > From: on behalf of Yuri L > Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > This use case is a good example of how it's hard to optimize across > multiple criteria. > > If you want a pre-packaged solution that's proven and easy to manage, > StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for > your requirements as such things get. Price may be an issue though, as > usual. > > If you're OK with rolling your own complex solution, my recommendation > would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external > disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via > the local block device interface. This avoids the pitfalls of data/metadata > replication, and offers a decent blend of performance, fault tolerance, and > disk management. You can use disk-based quorum if going with 2 nodes, or > traditional node majority quorum if using 3 nodes, either way would work. > There's no need to do any separation of roles (CES, quorum, managers, etc), > provided the nodes are adequately provisioned with memory and aren't > routinely overloaded, in which case you just need to add more nodes instead > of partitioning what you have. > > Using internal disks and relying on GPFS data/metadata replication, with > or without FPO, would mean taking the hard road. You may be able to spend > the least on hardware in such a config (although the 33% disk utilization > rate for triplication makes this less clear, if capacity is an issue), but > the operational challenges are going to be substantial. This would be a > viable config, but there are unavoidable tradeoffs caused by replication: > (1) writes are very expensive, which limits the overall cluster capability > for non-read-only workloads, (2) node and disk failures require a round of > re-replication, or "re-protection", which takes time and bandwidth, > limiting the overall capability further, (3) disk management can be a > challenge, as there's no software/hardware component to assist with > identifying failing/failed disks. As far as not going off the beaten path, > this is not it... Exporting protocols from a small triplicated file system > is not a typical mode of deployment of Spectrum Scale, you'd be blazing > some new trails. > > As stated already in several responses, there's no hard requirement that > CES Protocol nodes must be entirely separate from any other roles in the > general Spectrum Scale deployment scenario. IBM expressly disallows > co-locating Protocol nodes with ESS servers, due to resource consumption > complications, but for non-ESS cases it's merely a recommendation to run > Protocols on nodes that are not otherwise encumbered by having to provide > other services. Of course, the config that's the best for performance is > not the cheapest. CES doesn't reboot nodes to recover from NFS problems, > unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a > complex software stack is a complex software stack, so there's greater > potential for things to go sideways, in particular due to the lack of > resources. > > FPO vs plain replication: this only matters if you have apps that are > capable of exploiting data locality. FPO changes the way GPFS stripes data > across disks. Without FPO, GPFS does traditional wide striping of blocks > across all disks in a given storage pool. When FPO is in use, data in large > files is divided in large (e.g. 1G) chunks, and there's a node that holds > an entire chunk on its internal disks. An application that knows how to > query data block layout of a given file can then schedule the job that > needs to read from this chunk on the node that holds a local copy. This > makes a lot of sense for integrated data analytics workloads, a la Map > Reduce with Hadoop, but doesn't make sense for generic apps like Samba. > > I'm not sure what language in the FAQ creates the impression that the SAN > deployment model is somehow incompatible with running Procotol services. > This is perfectly fine. > > yuri > > [image: Inactive hide details for Jan-Frode Myklebust ---03/06/2016 > 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode > Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want > to stay within whatever is recommended. > > From: Jan-Frode Myklebust > To: gpfsug main discussion list , > Date: 03/06/2016 10:12 PM > Subject: Re: [gpfsug-discuss] Small cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I agree, but would also normally want to stay within whatever is > recommended. > > What about quorum/manager functions? Also OK to run these on the CES nodes > in a 2-node cluster, or any reason to partition these out so that we then > have a 4-node cluster running on 2 physical machines? > > > -jf > s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan <*makaplan at us.ibm.com* > >: > > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you > really need the protection/isolation. But I expect you are limiting the > potential performance of the overall system, compared to running a single > Unix image with multiple processes that can share resource and communicate > more freely. > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From chrisjscott at gmail.com Tue Mar 8 19:10:25 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 19:10:25 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts In-Reply-To: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: To add a customer data point, I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris On 8 March 2016 at 17:45, Dominic Mueller-Wicke01 < dominic.mueller at de.ibm.com> wrote: > Hi, > > please have a look at this document: > http://www-01.ibm.com/support/docview.wss?uid=swg27018848 > It describe the how-to setup and provides some hints and tips for > migration policies. > > Greetings, Dominic. > > > ______________________________________________________________________________________________________________ > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > > > For the new Spectrum Suite of products, are there specific references > with examples on how to set up gpfs policy rules to integrate TSM so > substantially improve the migration performance of HSM? > > The reason I ask is because I've been reading manuals with 200+ pages > where it's very clear this is possible to be accomplished, by builtin > lists and feeding those to TSM, however some of the examples and rules > are presented out of context, and not integrated onto a single > self-contained document. The GPFS past has it own set of manuals, but > so do TSM and HSM. > > For those of you already doing it, what has been your experience, what > are the tricks (where can I read about them), how the addition of > multiple nodes to the working pool is performing? > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:37:22 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:37:22 -0500 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output In-Reply-To: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> References: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> Message-ID: <201603081937.u28JbUoj017559@d01av03.pok.ibm.com> Bob, You can read ioql as "IO queue length" (outside of GPFS) and "qsdl" as QOS queue length at the QOS throttle within GPFS, computed from average delay introduced by the QOS subsystem. These "queue lengths" are virtual or fictional -- They are computed by observing average service times and applying Little's Law. That is there is no single actual queue but each IO request spends some time in the OS + network + disk controller + .... For IO bound workloads one can verify that ioql+qsdl is the average number of application threads waiting for IO. Our documentation puts it this way (See 4.2 Admin Guide, mmlsqos command) iops= The performance of the class in I/O operations per second. ioql= The average number of I/O requests in the class that are pending for reasons other than being queued by QoS. This number includes, for example, I/O requests that are waiting for network or storage device servicing. qsdl= The average number of I/O requests in the class that are queued by QoS. When the QoS system receives an I/O request from the file system, QoS first finds the class to which the I/O request belongs. It then finds whether the class has any I/O operations available for consumption. If not, then QoS queues the request until more I/O operations become available for the class. The Qsdl value is the average number of I/O requests that are held in this queue. et= The interval in seconds during which the measurement was made. You can calculate the average service time for an I/O operation as (Ioql + Qsdl)/Iops. For a system that is running IO-intensive applications, you can interpret the value (Ioql + Qsdl) as the number of threads in the I/O-intensive applications. This interpretation assumes that each thread spends most of its time in waiting for an I/O operation to complete. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 03/08/2016 10:53 AM Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:45:13 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:45:13 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts - Success story! In-Reply-To: References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: <201603081945.u28JjKrL008155@d01av01.pok.ibm.com> "I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris THANKS, SCOTT -- we love to hear/see customer comments and feedback, especially when they are positive ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 20:38:52 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 15:38:52 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> Message-ID: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Wed Mar 9 09:35:56 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Wed, 9 Mar 2016 10:35:56 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Message-ID: <201603090836.u298a1D1017873@d06av10.portsmouth.uk.ibm.com> Hi Jamie, I see. So, the recall-shutdown would be something for a short time period. right? Just for the time it takes to migrate files out and free space. If HSM would allow the recall-shutdown the impact for the users would be that each access to migrated files would lead to an access denied error. Would that be acceptable for the users? Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug-discuss at spectrumscale.org Date: 08.03.2016 21:38 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 12:12:08 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 07:12:08 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> Message-ID: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Yes! A behavior along those lines would be desirable. Users understand very well what it means for a file system to be near full. Are there any customers already doing something similar? Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jamie, > > I see. So, the recall-shutdown would be something for a short time period. > right? Just for the time it takes to migrate files out and free space. If > HSM would allow the recall-shutdown, the impact for the users would be that > each access to migrated files would lead to an access denied error. Would > that be acceptable for the users? > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Cc: gpfsug-discuss at spectrumscale.org > Date: 08.03.2016 21:38 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Thanks for the suggestions Dominic > > I remember playing around with premigrated files at the time, and that > was not satisfactory. > > What we are looking for is a configuration based parameter what will > basically break out of the "transparency for the user" mode, and not > perform any further recalling, period, if|when the file system > occupancy is above a certain threshold (98%). We would not mind if > instead gpfs would issue a preemptive "disk full" error message to any > user/app/job relying on those files to be recalled, so migration on > demand will have a chance to be performance. What we prefer is to swap > precedence, ie, any migration requests would be executed ahead of any > recalls, at least until a certain amount of free space on the file > system has been cleared. > > It's really important that this type of feature is present, for us to > reconsider the TSM version of HSM as a solution. It's not clear from > the manual that this can be accomplish in some fashion. > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> >> Hi, >> >> in all cases a recall request will be handled transparent for the user at >> the time a migrated files is accessed. This can't be prevented and has > two >> down sides: a) the space used in the file system increases and b) random >> access to storage media in the Spectrum Protect server happens. With > newer >> versions of Spectrum Protect for Space Management a so called tape >> optimized recall method is available that can reduce the impact to the >> system (especially Spectrum Protect server). >> If the problem was that the file system went out of space at the time the >> recalls came in I would recommend to reduce the threshold settings for > the >> file system and increase the number of premigrated files. This will allow >> to free space very quickly if needed. If you didn't use the policy based >> threshold migration so far I recommend to use it. This method is >> significant faster compared to the classical HSM based threshold > migration >> approach. >> >> Greetings, Dominic. >> >> > ______________________________________________________________________________________________________________ > >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 > 18:21 >> ----- >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 08.03.2016 17:36 >> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I'm wondering whether the new version of the "Spectrum Suite" will >> allow us set the priority of the HSM migration to be higher than >> staging. >> >> >> I ask this because back in 2011 when we were still using Tivoli HSM >> with GPFS, during mixed requests for migration and staging operations, >> we had a very annoying behavior in which the staging would always take >> precedence over migration. The end-result was that the GPFS would fill >> up to 100% and induce a deadlock on the cluster, unless we identified >> all the user driven stage requests in time, and killed them all. We >> contacted IBM support a few times asking for a way fix this, and were >> told it was built into TSM. Back then we gave up IBM's HSM primarily >> for this reason, although performance was also a consideration (more >> to this on another post). >> >> We are now reconsidering HSM for a new deployment, however only if >> this issue has been resolved (among a few others). >> >> What has been some of the experience out there? >> >> Thanks >> Jaime >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From chrisjscott at gmail.com Wed Mar 9 14:44:39 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Wed, 9 Mar 2016 14:44:39 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: Not meaning to hjack the thread but while we're on the topic of transparent recall: I'd like to be able to disable it such that I can use SS ILM policies agreed with the data owners to "archive" their data and recover disk space by migrating files to tape, marking them as immutable to defend against accidental or malicious deletion and have some user interface that would let them "retrieve" the data back to disk as writable again, subject to sufficient free disk space and within any quota limits as applicable. Cheers Chris On 9 March 2016 at 12:12, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > > >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> >> ______________________________________________________________________________________________________________ >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >> | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the user at >>> the time a migrated files is accessed. This can't be prevented and has >>> >> two >> >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >>> >> newer >> >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time the >>> recalls came in I would recommend to reduce the threshold settings for >>> >> the >> >>> file system and increase the number of premigrated files. This will allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >>> >> migration >> >>> approach. >>> >>> Greetings, Dominic. >>> >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >>> >> | >> >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> >> 18:21 >> >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list < >>> gpfsug-discuss at spectrumscale.org> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. >>> migration >>> >> priority >> >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 9 15:05:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 9 Mar 2016 10:05:31 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> For a write or create operation ENOSPC would make some sense. But if the file already exists and I'm just opening for read access I would be very confused by ENOSPC. How should the system respond: "Sorry, I know about that file, I have it safely stored away in HSM, but it is not available right now. Try again later!" EAGAIN or EBUSY might be the closest in ordinary language... But EAGAIN is used when a system call is interrupted and can be retried right away... So EBUSY? The standard return codes in Linux are: #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* I/O error */ #define ENXIO 6 /* No such device or address */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file number */ #define ECHILD 10 /* No child processes */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memory */ #define EACCES 13 /* Permission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Block device required */ #define EBUSY 16 /* Device or resource busy */ #define EEXIST 17 /* File exists */ #define EXDEV 18 /* Cross-device link */ #define ENODEV 19 /* No such device */ #define ENOTDIR 20 /* Not a directory */ #define EISDIR 21 /* Is a directory */ #define EINVAL 22 /* Invalid argument */ #define ENFILE 23 /* File table overflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a typewriter */ #define ETXTBSY 26 /* Text file busy */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-only file system */ #define EMLINK 31 /* Too many links */ #define EPIPE 32 /* Broken pipe */ #define EDOM 33 /* Math argument out of domain of func */ #define ERANGE 34 /* Math result not representable */ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 15:21:53 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 10:21:53 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> Message-ID: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Wed Mar 9 19:56:13 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 14:56:13 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) Message-ID: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Here is another area where I've been reading material from several sources for years, and in fact trying one solution over the other from time-to-time in a test environment. However, to date I have not been able to find a one-piece-document where all these different IBM alternatives for backup are discussed at length, with the pos and cons well explained, along with the how-to's. I'm currently using TSM(built-in backup client), and over the years I developed a set of tricks to rely on disk based volumes as intermediate cache, and multiple backup client nodes, to split the load and substantially improve the performance of the backup compared to when I first deployed this solution. However I suspect it could still be improved further if I was to apply tools from the GPFS side of the equation. I would appreciate any comments/pointers. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From YARD at il.ibm.com Wed Mar 9 20:16:59 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 9 Mar 2016 22:16:59 +0200 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Message-ID: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Hi Did u use mmbackup with TSM ? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm Please also review this : http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > From: Jaime Pinto > To: gpfsug main discussion list > Date: 03/09/2016 09:56 PM > Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup > scripts) vs. TSM(backup) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Here is another area where I've been reading material from several > sources for years, and in fact trying one solution over the other from > time-to-time in a test environment. However, to date I have not been > able to find a one-piece-document where all these different IBM > alternatives for backup are discussed at length, with the pos and cons > well explained, along with the how-to's. > > I'm currently using TSM(built-in backup client), and over the years I > developed a set of tricks to rely on disk based volumes as > intermediate cache, and multiple backup client nodes, to split the > load and substantially improve the performance of the backup compared > to when I first deployed this solution. However I suspect it could > still be improved further if I was to apply tools from the GPFS side > of the equation. > > I would appreciate any comments/pointers. > > Thanks > Jaime > > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 21:33:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 16:33:49 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Message-ID: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Quoting Yaron Daniel : > Hi > > Did u use mmbackup with TSM ? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm I have used mmbackup on test mode a few times before, while under gpfs 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our facility yet). Under both 3.2 and 3.3 mmbackup would always lock up our cluster when using snapshot. I never understood the behavior without snapshot, and the lock up was intermittent in the carved-out small test cluster, so I never felt confident enough to deploy over the larger 4000+ clients cluster. Another issue was that the version of mmbackup then would not let me choose the client environment associated with a particular gpfs file system, fileset or path, and the equivalent storage pool and /or policy on the TSM side. With the native TSM client we can do this by configuring the dsmenv file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so we can keep the backups segregated on different pools/tapes if necessary (by user, by group, by project, etc) The problem we all agree on is that TSM client traversing is VERY SLOW, and can not be parallelized. I always knew that the mmbackup client was supposed to replace the TSM client for the traversing, and then parse the "necessary parameters" and files to the native TSM client, so it could then take over for the remainder of the workflow. Therefore, the remaining problems are as follows: * I never understood the snapshot induced lookup, and how to fix it. Was it due to the size of our cluster or the version of GPFS? Has it been addressed under 3.5 or 4.x series? Without the snapshot how would mmbackup know what was already gone to backup since the previous incremental backup? Does it check each file against what is already on TSM to build the list of candidates? What is the experience out there? * In the v4r2 version of the manual for the mmbackup utility we still don't seem to be able to determine which TSM BA Client dsmenv to use as a parameter. All we can do is choose the --tsm-servers TSMServer[,TSMServer...]] . I can only conclude that all the contents of any backup on the GPFS side will always end-up on a default storage pool and use the standard TSM policy if nothing else is done. I'm now wondering if it would be ok to simply 'source dsmenv' from a shell for each instance of the mmbackup we fire up, in addition to setting up the other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described on man page. * what about the restore side of things? Most mm* commands can only be executed by root. Should we still have to rely on the TSM BA Client (dsmc|dsmj) if unprivileged users want to restore their own stuff? I guess I'll have to conduct more experiments. > > Please also review this : > > http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf > This is pretty good, as a high level overview. Much better than a few others I've seen with the release of the Spectrum Suite, since it focus entirely on GPFS/TSM/backup|(HSM). It would be nice to have some typical implementation examples. Thanks a lot for the references Yaron, and again thanks for any further comments. Jaime > > > Regards > > > > > > Yaron Daniel > 94 Em Ha'Moshavot Rd > > Server, Storage and Data Services - Team Leader > Petach Tiqva, 49527 > Global Technology Services > Israel > Phone: > +972-3-916-5672 > > > Fax: > +972-3-916-5672 > > > Mobile: > +972-52-8395593 > > > e-mail: > yard at il.ibm.com > > > IBM Israel > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 03/09/2016 09:56 PM >> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >> scripts) vs. TSM(backup) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> Here is another area where I've been reading material from several >> sources for years, and in fact trying one solution over the other from >> time-to-time in a test environment. However, to date I have not been >> able to find a one-piece-document where all these different IBM >> alternatives for backup are discussed at length, with the pos and cons >> well explained, along with the how-to's. >> >> I'm currently using TSM(built-in backup client), and over the years I >> developed a set of tricks to rely on disk based volumes as >> intermediate cache, and multiple backup client nodes, to split the >> load and substantially improve the performance of the backup compared >> to when I first deployed this solution. However I suspect it could >> still be improved further if I was to apply tools from the GPFS side >> of the equation. >> >> I would appreciate any comments/pointers. >> >> Thanks >> Jaime >> >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu Mar 10 08:17:18 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 10 Mar 2016 09:17:18 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Message-ID: <201603100817.u2A8HLXK012633@d06av02.portsmouth.uk.ibm.com> Hi Jaime, I received the same request from other customers as well. could you please open a RFE for the theme and send me the RFE ID? I will discuss it with the product management then. RFE Link: https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Date: 09.03.2016 16:22 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From konstantin.arnold at unibas.ch Thu Mar 10 08:56:01 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Thu, 10 Mar 2016 09:56:01 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <56E136A1.8020202@unibas.ch> Hi Jaime, ... maybe I can give some comments with experience from the field: I would suggest, after reaching a high-watermark threshold, the recall speed could be throttled to a rate that is lower than migration speed (but still high enough to not run into a timeout). I don't think it's a good idea to send access denied while trying to prioritize migration. If non-IT people would see this message they could think the system is broken. It would be unclear what a batch job would do that has to prepare data, in the worst case processing would start with incomplete data. We are currently recalling all out data on tape to be moved to a different system. There is 15x more data on tape than what would fit on the disk pool (and there are millions of files before we set inode quota to a low number). We are moving user/project after an other by using tape ordered recalls. For that we had to disable a policy that was aggressively pre-migrating files and allowed to quickly free space on the disk pool. I must admit that it took us a while of tuning thresholds and policies. Best Konstantin On 03/09/2016 01:12 PM, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time >> period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> ______________________________________________________________________________________________________________ >> >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >> Lead | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the >>> user at >>> the time a migrated files is accessed. This can't be prevented and has >> two >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >> newer >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time >>> the >>> recalls came in I would recommend to reduce the threshold settings for >> the >>> file system and increase the number of premigrated files. This will >>> allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >> migration >>> approach. >>> >>> Greetings, Dominic. >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead >> | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>> Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >> 18:21 >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Thu Mar 10 10:55:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 05:55:21 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <56E136A1.8020202@unibas.ch> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <56E136A1.8020202@unibas.ch> Message-ID: <20160310055521.85234y7d2m6c97kp@support.scinet.utoronto.ca> Quoting Konstantin Arnold : > Hi Jaime, > > ... maybe I can give some comments with experience from the field: > I would suggest, after reaching a high-watermark threshold, the recall > speed could be throttled to a rate that is lower than migration speed > (but still high enough to not run into a timeout). I don't think it's a > good idea to send access denied while trying to prioritize migration. If > non-IT people would see this message they could think the system is > broken. It would be unclear what a batch job would do that has to > prepare data, in the worst case processing would start with incomplete data. I wouldn't object to any strategy that lets us empty the vase quicker than it's being filled. It may just make the solution more complex for developers, since this feels a lot like a mini-scheduler. On the other hand I don't see much of an issue for non-IT people or batch jobs depending on the data to be recalled: we already enable quotas on our file systems. When quotas are reached the system is supposed to "break" anyway, for that particular user|group or application, and they still have to handle this situation properly. > > We are currently recalling all out data on tape to be moved to a > different system. There is 15x more data on tape than what would fit on > the disk pool (and there are millions of files before we set inode quota > to a low number). We are moving user/project after an other by using > tape ordered recalls. For that we had to disable a policy that was > aggressively pre-migrating files and allowed to quickly free space on > the disk pool. I must admit that it took us a while of tuning thresholds > and policies. That is certainly an approach to consider. We still think the application should be able to properly manage occupancy on the same file system. We run a different system which has a disk based cache layer as well, and the strategy is to keep it as full as possible (85-90%), so to avoid retrieving data from tape whenever possible, while still leaving some cushion for newly saved data. Indeed finding the sweet spot is a balancing act. Thanks for the feedback Jaime > > Best > Konstantin > > > > On 03/09/2016 01:12 PM, Jaime Pinto wrote: >> Yes! A behavior along those lines would be desirable. Users understand >> very well what it means for a file system to be near full. >> >> Are there any customers already doing something similar? >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> Hi Jamie, >>> >>> I see. So, the recall-shutdown would be something for a short time >>> period. >>> right? Just for the time it takes to migrate files out and free space. If >>> HSM would allow the recall-shutdown, the impact for the users would be >>> that >>> each access to migrated files would lead to an access denied error. Would >>> that be acceptable for the users? >>> >>> Greetings, Dominic. >>> >>> ______________________________________________________________________________________________________________ >>> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> >>> >>> >>> From: Jaime Pinto >>> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 08.03.2016 21:38 >>> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>> >>> >>> >>> Thanks for the suggestions Dominic >>> >>> I remember playing around with premigrated files at the time, and that >>> was not satisfactory. >>> >>> What we are looking for is a configuration based parameter what will >>> basically break out of the "transparency for the user" mode, and not >>> perform any further recalling, period, if|when the file system >>> occupancy is above a certain threshold (98%). We would not mind if >>> instead gpfs would issue a preemptive "disk full" error message to any >>> user/app/job relying on those files to be recalled, so migration on >>> demand will have a chance to be performance. What we prefer is to swap >>> precedence, ie, any migration requests would be executed ahead of any >>> recalls, at least until a certain amount of free space on the file >>> system has been cleared. >>> >>> It's really important that this type of feature is present, for us to >>> reconsider the TSM version of HSM as a solution. It's not clear from >>> the manual that this can be accomplish in some fashion. >>> >>> Thanks >>> Jaime >>> >>> Quoting Dominic Mueller-Wicke01 : >>> >>>> >>>> >>>> Hi, >>>> >>>> in all cases a recall request will be handled transparent for the >>>> user at >>>> the time a migrated files is accessed. This can't be prevented and has >>> two >>>> down sides: a) the space used in the file system increases and b) random >>>> access to storage media in the Spectrum Protect server happens. With >>> newer >>>> versions of Spectrum Protect for Space Management a so called tape >>>> optimized recall method is available that can reduce the impact to the >>>> system (especially Spectrum Protect server). >>>> If the problem was that the file system went out of space at the time >>>> the >>>> recalls came in I would recommend to reduce the threshold settings for >>> the >>>> file system and increase the number of premigrated files. This will >>>> allow >>>> to free space very quickly if needed. If you didn't use the policy based >>>> threshold migration so far I recommend to use it. This method is >>>> significant faster compared to the classical HSM based threshold >>> migration >>>> approach. >>>> >>>> Greetings, Dominic. >>>> >>>> >>> ______________________________________________________________________________________________________________ >>> >>> >>>> >>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>>> Lead >>> | >>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>>> >>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>>> Wittkopp >>>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>>> Stuttgart, >>>> HRB 243294 >>>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> 18:21 >>>> ----- >>>> >>>> From: Jaime Pinto >>>> To: gpfsug main discussion list >>>> >>>> Date: 08.03.2016 17:36 >>>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I'm wondering whether the new version of the "Spectrum Suite" will >>>> allow us set the priority of the HSM migration to be higher than >>>> staging. >>>> >>>> >>>> I ask this because back in 2011 when we were still using Tivoli HSM >>>> with GPFS, during mixed requests for migration and staging operations, >>>> we had a very annoying behavior in which the staging would always take >>>> precedence over migration. The end-result was that the GPFS would fill >>>> up to 100% and induce a deadlock on the cluster, unless we identified >>>> all the user driven stage requests in time, and killed them all. We >>>> contacted IBM support a few times asking for a way fix this, and were >>>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>>> for this reason, although performance was also a consideration (more >>>> to this on another post). >>>> >>>> We are now reconsidering HSM for a new deployment, however only if >>>> this issue has been resolved (among a few others). >>>> >>>> What has been some of the experience out there? >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.org >>>> University of Toronto >>>> 256 McCaul Street, Room 235 >>>> Toronto, ON, M5T1W5 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu Mar 10 11:17:41 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 06:17:41 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Here is some feedback on the use of mmbackup: Last night I decided to test mmbackup again, in the simplest syntax call possible (see below), and it ran like a charm! We have a 15TB GPFS with some 41 million files, running gpfs v 3.5; it certainty behaved better than what I remember when I last tried this under 3.3 or 3.2, however I still didn't specify a snapshot. I guess it didn't really matter. My idea of sourcing the dsmenv file normally used by the TSM BA client before starting mmbackup was just what I needed to land the backup material in the same pool and using the same policies normally used by the TSM BA client for this file system. For my surprise, mmbackup was smart enough to query the proper TSM database for all files already there and perform the incremental backup just as the TSM client would on its own. The best of all: it took just under 7 hours, while previously the TSM client was taking over 27 hours: that is nearly 1/4 of the time, using the same node! This is really good, since now I can finally do a true *daily* backup of this FS, so I'll refining and adopting this process moving forward, possibly adding a few more nodes as traversing helpers. Cheers Jaime [root at gpc-f114n016 bin]# mmbackup /sysadmin -t incremental -s /tmp -------------------------------------------------------- mmbackup: Backup of /sysadmin begins at Wed Mar 9 19:45:27 EST 2016. -------------------------------------------------------- Wed Mar 9 19:45:48 2016 mmbackup:Could not restore previous shadow file from TSM server TAPENODE Wed Mar 9 19:45:48 2016 mmbackup:Querying files currently backed up in TSM server:TAPENODE. Wed Mar 9 21:55:59 2016 mmbackup:Built query data file from TSM server: TAPENODE rc = 0 Wed Mar 9 21:56:01 2016 mmbackup:Scanning file system sysadmin Wed Mar 9 23:47:53 2016 mmbackup:Reconstructing previous shadow file /sysadmin/.mmbackupShadow.1.TAPENODE from query data for TAPENODE Thu Mar 10 01:05:06 2016 mmbackup:Determining file system changes for sysadmin [TAPENODE]. Thu Mar 10 01:08:40 2016 mmbackup:changed=26211, expired=30875, unsupported=0 for server [TAPENODE] Thu Mar 10 01:08:40 2016 mmbackup:Sending files to the TSM server [26211 changed, 30875 expired]. Thu Mar 10 01:38:41 2016 mmbackup:Expiring files: 0 backed up, 15500 expired, 0 failed. Thu Mar 10 02:42:08 2016 mmbackup:Backing up files: 10428 backed up, 30875 expired, 72 failed. Thu Mar 10 02:58:40 2016 mmbackup:mmapplypolicy for Backup detected errors (rc=9). Thu Mar 10 02:58:40 2016 mmbackup:Completed policy backup run with 0 policy errors, 72 files failed, 0 severe errors, returning rc=9. Thu Mar 10 02:58:40 2016 mmbackup:Policy for backup returned 9 Highest TSM error 4 mmbackup: TSM Summary Information: Total number of objects inspected: 57086 Total number of objects backed up: 26139 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 30875 Total number of objects failed: 72 Thu Mar 10 02:58:40 2016 mmbackup:Analyzing audit log file /sysadmin/mmbackup.audit.sysadmin.TAPENODE Thu Mar 10 02:58:40 2016 mmbackup:72 files not backed up for this server. ( failed:72 ) Thu Mar 10 02:58:40 2016 mmbackup:Worst TSM exit 4 Thu Mar 10 02:58:41 2016 mmbackup:72 failures were logged. Compensating shadow database... Thu Mar 10 03:06:23 2016 mmbackup:Analysis complete. 72 of 72 failed or excluded paths compensated for in 1 pass(es). Thu Mar 10 03:09:08 2016 mmbackup:TSM server TAPENODE had 72 failures or excluded paths and returned 4. Its shadow database has been updated. Thu Mar 10 03:09:08 2016 mmbackup:Incremental backup completed with some skipped files. TSM had 0 severe errors and returned 4. See the TSM log file for more information. 72 files had errors, TSM audit logs recorded 72 errors from 1 TSM servers, 0 TSM servers skipped. exit 4 ---------------------------------------------------------- mmbackup: Backup of /sysadmin completed with some skipped files at Thu Mar 10 03:09:11 EST 2016. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. Quoting Jaime Pinto : > Quoting Yaron Daniel : > >> Hi >> >> Did u use mmbackup with TSM ? >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm > > I have used mmbackup on test mode a few times before, while under gpfs > 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our > facility yet). > > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so I > never felt confident enough to deploy over the larger 4000+ clients > cluster. > > Another issue was that the version of mmbackup then would not let me > choose the client environment associated with a particular gpfs file > system, fileset or path, and the equivalent storage pool and /or policy > on the TSM side. > > With the native TSM client we can do this by configuring the dsmenv > file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so > we can keep the backups segregated on different pools/tapes if > necessary (by user, by group, by project, etc) > > The problem we all agree on is that TSM client traversing is VERY SLOW, > and can not be parallelized. I always knew that the mmbackup client was > supposed to replace the TSM client for the traversing, and then parse > the "necessary parameters" and files to the native TSM client, so it > could then take over for the remainder of the workflow. > > Therefore, the remaining problems are as follows: > * I never understood the snapshot induced lookup, and how to fix it. > Was it due to the size of our cluster or the version of GPFS? Has it > been addressed under 3.5 or 4.x series? Without the snapshot how would > mmbackup know what was already gone to backup since the previous > incremental backup? Does it check each file against what is already on > TSM to build the list of candidates? What is the experience out there? > > * In the v4r2 version of the manual for the mmbackup utility we still > don't seem to be able to determine which TSM BA Client dsmenv to use as > a parameter. All we can do is choose the --tsm-servers > TSMServer[,TSMServer...]] . I can only conclude that all the contents > of any backup on the GPFS side will always end-up on a default storage > pool and use the standard TSM policy if nothing else is done. I'm now > wondering if it would be ok to simply 'source dsmenv' from a shell for > each instance of the mmbackup we fire up, in addition to setting up the > other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described > on man page. > > * what about the restore side of things? Most mm* commands can only be > executed by root. Should we still have to rely on the TSM BA Client > (dsmc|dsmj) if unprivileged users want to restore their own stuff? > > I guess I'll have to conduct more experiments. > > > >> >> Please also review this : >> >> http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf >> > > This is pretty good, as a high level overview. Much better than a few > others I've seen with the release of the Spectrum Suite, since it focus > entirely on GPFS/TSM/backup|(HSM). It would be nice to have some > typical implementation examples. > > > > Thanks a lot for the references Yaron, and again thanks for any further > comments. > Jaime > > >> >> >> Regards >> >> >> >> >> >> Yaron Daniel >> 94 Em Ha'Moshavot Rd >> >> Server, Storage and Data Services - Team Leader >> Petach Tiqva, 49527 >> Global Technology Services >> Israel >> Phone: >> +972-3-916-5672 >> >> >> Fax: >> +972-3-916-5672 >> >> >> Mobile: >> +972-52-8395593 >> >> >> e-mail: >> yard at il.ibm.com >> >> >> IBM Israel >> >> >> >> >> >> >> >> gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: >> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> Date: 03/09/2016 09:56 PM >>> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >>> scripts) vs. TSM(backup) >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> Here is another area where I've been reading material from several >>> sources for years, and in fact trying one solution over the other from >>> time-to-time in a test environment. However, to date I have not been >>> able to find a one-piece-document where all these different IBM >>> alternatives for backup are discussed at length, with the pos and cons >>> well explained, along with the how-to's. >>> >>> I'm currently using TSM(built-in backup client), and over the years I >>> developed a set of tricks to rely on disk based volumes as >>> intermediate cache, and multiple backup client nodes, to split the >>> load and substantially improve the performance of the backup compared >>> to when I first deployed this solution. However I suspect it could >>> still be improved further if I was to apply tools from the GPFS side >>> of the equation. >>> >>> I would appreciate any comments/pointers. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Thu Mar 10 12:00:09 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 10 Mar 2016 12:00:09 +0000 Subject: [gpfsug-discuss] systemd Message-ID: So just picking up this from Feb 2015, have been doing some upgrades to 4.2.0.1, and see that there is now systemd support as part of this... Now I just need to unpick the local hacks we put into the init script (like wait for IB to come up) and implement those as proper systemd deps I guess. Thanks for sorting this though IBM! Simon On 10/02/2015, 15:17, "gpfsug-discuss-bounces at gpfsug.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: >Does any one have a systemd manifest for GPFS which they would share? > >As RedHat EL 7 is now using systemd and Ubuntu is now supported with >4.1p5, it seems sensible for GPFS to have systemd support. > >We're testing some services running off gpfs and it would be useful to >have a manifest so we can make the services dependent on gpfs being up >before they start. > >Or any suggestions on making systemd services dependent on a SysV script? > >Thanks > >Simon >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu Mar 10 14:46:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 10 Mar 2016 09:46:12 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com><20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Message-ID: <201603101446.u2AEkJPP018456@d01av02.pok.ibm.com> Jaime, Thanks for the positive feedback and success story on mmbackup. We need criticism to keep improving the product - but we also need encouragement to know we are heading in the right direction and making progress. BTW - (depending on many factors) you may be able to save some significant backup time by running over multiple nodes with the -N option. --marc. (I am Mr. mmapplypolicy and work with Mr. mmbackup.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Mar 11 00:15:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 19:15:49 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> Message-ID: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From s.m.killen at leeds.ac.uk Fri Mar 11 13:19:41 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:19:41 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Message-ID: <56E2C5ED.8060500@leeds.ac.uk> Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From stschmid at de.ibm.com Fri Mar 11 13:41:54 2016 From: stschmid at de.ibm.com (Stefan Schmidt) Date: Fri, 11 Mar 2016 14:41:54 +0100 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> The message means following and is a warning without an direct affect to the function but an indicator that something is may wrong with the enclosure. Check the maintenance procedure which is shown for the event in the GUI event panel. /** Ambient temperature of power supply "{0}" undercut the lower warning threshold at {1}. */ MS0297("MS0297W",'W'), "Cause": "If the lower warning threshold is undercut a the device operation should not be affected. However this might indicate a hardware defect.", "User_action": "Follow the maintenance procedure for the enclosure.", "code": "MS0297", "description": "Ambient temperature of power supply \"{0}\" undercut the lower warning threshold at {1}.", Mit freundlichen Gr??en / Kind regards Stefan Schmidt Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. M069 / IBM Spectrum Scale Software Development IBM Systems Group IBM Deutschland Phone: +49-6131-84-3465 IBM Deutschland Mobile: +49-170-6346601 Hechtsheimer Str. 2 E-Mail: stschmid at de.ibm.com 55131 Mainz Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.m.killen at leeds.ac.uk Fri Mar 11 13:50:58 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:50:58 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> References: <56E2C5ED.8060500@leeds.ac.uk> <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> Message-ID: <0D6C2DBC-4B82-4038-83C0-B0255C8DF9E0@leeds.ac.uk> Hi Stefan Thanks for the quick reply, I appear to have mistyped the error.. It's MS0279. See attached png. -- Sean --? ------------------------------------------------------------------- ??? Dr Sean M Killen ??? UNIX Support Officer, IT ??? Faculty of Biological Sciences ??? University of Leeds ??? LEEDS ??? LS2 9JT ??? United Kingdom ??? Tel: +44 (0)113 3433148 ??? Mob: +44 (0)776 8670907 ??? Fax: +44 (0)113 3438465 ??? GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- On 11 March 2016 13:41:54 GMT+00:00, Stefan Schmidt wrote: >The message means following and is a warning without an direct affect >to >the function but an indicator that something is may wrong with the >enclosure. Check the maintenance procedure which is shown for the event >in >the GUI event panel. > >/** Ambient temperature of power supply "{0}" undercut the lower >warning >threshold at {1}. */ > MS0297("MS0297W",'W'), > "Cause": "If the lower warning threshold is undercut a the >device operation should not be affected. However this might indicate a >hardware defect.", > "User_action": "Follow the maintenance procedure for the >enclosure.", > "code": "MS0297", > "description": "Ambient temperature of power supply \"{0}\" >undercut the lower warning threshold at {1}.", > > >Mit freundlichen Gr??en / Kind regards > >Stefan Schmidt > >Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. >M069 / IBM Spectrum Scale Software Development >IBM Systems Group >IBM Deutschland > > > >Phone: >+49-6131-84-3465 > IBM Deutschland > >Mobile: >+49-170-6346601 > Hechtsheimer Str. 2 >E-Mail: >stschmid at de.ibm.com > 55131 Mainz > > > Germany > > >IBM Deutschland Research & Development GmbH / Vorsitzende des >Aufsichtsrats: Martina Koederitz >Gesch?ftsf?hrung: Dirk Wittkopp >Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht >Stuttgart, >HRB 243294 > > > > > > >------------------------------------------------------------------------ > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot-AstburyBSL.absl.prv - Dashboard - Mozilla Firefox.png Type: image/png Size: 144612 bytes Desc: not available URL: From sophie.carsten at uk.ibm.com Fri Mar 11 13:53:36 2016 From: sophie.carsten at uk.ibm.com (Sophie Carsten) Date: Fri, 11 Mar 2016 13:53:36 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111355.u2BDtMBO007426@d06av12.portsmouth.uk.ibm.com> Hi, In terms of the NSDs, you need to run the nsd devices script if they're not in /dev/dmX-, here's the link to the knowledge center: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_nsdpro.htm?lang=en The installer should work as normal after this script has been run. We were hoping to get this solved in the upcoming version of the installer, so the user doesn't have to manually run the script. But the previous install team has been put on a new project in IBM, and I can't really comment any longer on when this could be expected to be delivered by the new team put in place. Hope the link gets you further off the ground though. Sophie Carsten IBM Spectrum Virtualize Development Engineer IBM Systems - Manchester Lab 44-161-9683886 sophie.carsten at uk.ibm.com From: Sean Killen To: gpfsug main discussion list Date: 11/03/2016 13:20 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- [attachment "signature.asc" deleted by Sophie Carsten/UK/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 11422 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Fri Mar 11 14:30:24 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Mar 2016 14:30:24 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 Message-ID: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Mar 11 15:27:20 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Mar 2016 10:27:20 -0500 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111522.u2BFMqvG008617@d01av05.pok.ibm.com> You may need/want to set up an nsddevices script to help GPFS find all your disks. Google it! Or ... http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_nsddevices.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Fri Mar 11 15:46:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 15:46:39 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <1457711199.4251.245.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-11 at 13:19 +0000, Sean Killen wrote: > Hi all, > > So I have finally got my SpectrumScale system installed (well half of > it). But it wasn't without some niggles. > > We have purchased DELL MD3860i disk trays with dual controllers (each > with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a > while getting a nice multipath setup in place with 'friendly' names set > Oh dear. I guess it might work with 10Gb Ethernet but based on my personal experience iSCSI is spectacularly unsuited to GPFS. Either your NSD nodes can overwhelm the storage arrays or the storage arrays can overwhelm the NSD servers and performance falls through the floor. That is unless you have Data Center Ethernet at which point you might as well have gone Fibre Channel in the first place. Though unless you are going to have large physical separation between the storage and NSD servers 12Gb SAS is a cheaper option and you can still have four NSD servers hooked up to each MD3 based storage array. I have in the past implement GPFS on Dell MD3200i's. I did eventually get it working reliably but it was so suboptimal with so many compromises that as soon as the MD3600f came out we purchased these to replaced the MD3200i's. Lets say you have three storage arrays with two paths to each controller and four NSD servers. Basically what happens is that an NSD server issues a bunch of requests for blocks to the storage arrays. Then all 12 paths start answering to your two connections to the NSD server. At this point the Ethernet adaptors on your NSD servers are overwhelmed 802.1D PAUSE frames start being issued which just result in head of line blocking and performance falls through the floor. You need Data Center Ethernet to handle this properly, which is probably why FCoE never took off as you can't just use the Ethernet switches and adaptors you have. Both FC and SAS handle this sort of congestion gracefully unlike ordinary Ethernet. Now the caveat for all this is that it is much easier to overwhelm a 1Gbps link than a 10Gbps link. However with the combination of SSD and larger cache's I can envisage that a 10Gbps link could be overwhelmed and you would then see the same performance issues that I saw. Basically the only way out is a one to one correspondence between ports on the NSD's and the storage controllers. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Fri Mar 11 15:46:46 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 15:46:46 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 In-Reply-To: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> References: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> Message-ID: Hi Bob, But on the plus side, I noticed in the release notes: "If you are coming from 4.1.1-X, you must first upgrade to 4.2.0-0. You may use this 4.2.0-2 package to perform a First Time Install or to upgrade from an existing 4.2.0-X level." So it looks like its no longer necessary to install 4.2.0 and then apply PTFs. I remember talking to someone a while ago and they were hoping this might happen, but it seems that it actually has! Nice! Simon From: > on behalf of "Oesterlin, Robert" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 11 March 2016 at 14:30 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Fri Mar 11 16:02:37 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Fri, 11 Mar 2016 17:02:37 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Message-ID: <201603111502.u2BF2kk6007636@d06av10.portsmouth.uk.ibm.com> Jaime, found the RFE and will discuss it with product management. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug main discussion list , Marc A Kaplan Date: 11.03.2016 01:15 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Fri Mar 11 20:55:29 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 20:55:29 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 Message-ID: What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Mar 11 21:10:14 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 21:10:14 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: That looks pretty much like the right process. Check that all the components upgraded ... rpm -qa | grep gpfs You may need to do an rpm -e on the gpfs.gplbin package and then install the newly built one Are you doing make rpm to build the rpm version of gpfs.gplbin and installing that? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [damir.krstic at gmail.com] Sent: 11 March 2016 20:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir From damir.krstic at gmail.com Fri Mar 11 21:13:47 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 21:13:47 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: Thanks for the reply. Didn't run make rpm just make autoconfig etc. Checked the versions and it all looks good and valid. Will play with it again and see if there is a step missing. Damir On Fri, Mar 11, 2016 at 15:10 Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. > > Check that all the components upgraded ... rpm -qa | grep gpfs > > You may need to do an rpm -e on the gpfs.gplbin package and then install > the newly built one > > Are you doing make rpm to build the rpm version of gpfs.gplbin and > installing that? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [ > damir.krstic at gmail.com] > Sent: 11 March 2016 20:55 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs > 3.5.0-21 > > What is the correct procedure to upgrade from 3.5 to 4.1? > > What I have tried is uninstalling existing 3.5 version (rpm -e) and > installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled > kernel extensions: > cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages > > Rebooted the node and have been getting: > daemon and kernel extension do not match. > > I've tried rebuilding extensions again and still could not get it to work. > I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting > daemon and kernel extension do not match error with 3.5 version on a single > node. So, couple of questions: > What is the correct way of upgrading from 3.5 to 4.1.0.0? > > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 11 22:58:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 22:58:08 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: <56E34D80.7000703@buzzard.me.uk> On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their derivatives), then they will now complain constantly that you have modified the RPM database outside yum. As such it is recommended by RedHat that you do "yum remove" and "yum install" rather than running rpm directly. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From pavel.pokorny at datera.cz Sat Mar 12 08:23:49 2016 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Sat, 12 Mar 2016 09:23:49 +0100 Subject: [gpfsug-discuss] SMB and NFS limitations? Message-ID: Hello, on Spectrum Scale FAQ page I found following recommendations for SMB and NFS: *A maximum of 3,000 SMB connections is recommended per protocol node with a maximum of 20,000 SMB connections per cluster. A maximum of 4,000 NFS connections per protocol node is recommended. A maximum of 2,000 Object connections per protocol nodes is recommended.* Are there any other limits? Like max number of shares? Thanks, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Hadovit? 962/10 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Mon Mar 14 14:22:20 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Mon, 14 Mar 2016 14:22:20 +0000 Subject: [gpfsug-discuss] Registration now open! Message-ID: <400eedb0a81cd193a694176794f1dc07@webmail.gpfsug.org> Dear members, The registration for the UK Spring 2016 Spectrum Scale (GPFS) User Group meeting is now open. We have a fantastic and full agenda of presentations from users and subject experts. The two-day event is taking place at the IBM Client Centre in London on 17th and 18th May. For the current agenda, further details and to register your place, please visit: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 Places at the event are limited so it is recommended that you register early to avoid disappointment. Due to capacity restrictions, there is currently a limit of three people per organisation; this will be relaxed if places remain nearer the event date. We'd like to thank our sponsors of this year's User Group as without their support the two-day event would not be possible. Thanks go to Arcastream, DDN, IBM, Lenovo, Mellanox, NetApp, OCF and Seagate for their support. We hope to see you at the May event! Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 15 19:39:51 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 15 Mar 2016 15:39:51 -0400 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? Message-ID: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From damir.krstic at gmail.com Tue Mar 15 20:31:55 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:31:55 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Mar 15 20:33:35 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:33:35 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: <56E34D80.7000703@buzzard.me.uk> References: <56E34D80.7000703@buzzard.me.uk> Message-ID: Figured it out - this node had RedHat version of a kernel that was custom patched by RedHat some time ago for the IB issues we were experiencing. I could not build a portability layer on this kernel. After upgrading the node to more recent version of the kernel, I was able to compile portability layer and get it all working. Thanks for suggestions. Damir On Fri, Mar 11, 2016 at 4:58 PM Jonathan Buzzard wrote: > On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > > > That looks pretty much like the right process. > > Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their > derivatives), then they will now complain constantly that you have > modified the RPM database outside yum. > > As such it is recommended by RedHat that you do "yum remove" and "yum > install" rather than running rpm directly. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:45:05 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:45:05 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Mar 15 21:50:20 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 15 Mar 2016 21:50:20 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Not sure about cluster features, but at minimum you'll need to create the filesystem with low enough mmcrfs --version string. -jf tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic : > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Mar 15 22:22:17 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 15 Mar 2016 23:22:17 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <56E88B19.4060708@unibas.ch> It's definitely doable, besides --version mentioned byJan-Frode, just a two things to consider (when cluster started as 3.5 or earlier version) we stumbled across: - keys nistCompliance=SP800-131A: we had to regenerate and exchange new keys with nistCompliance before old cluster could talk to new remotecluster - maxblocksize: you would want ESS to run with maxblocksize 16M - cluster with 3.5 probably has set a smaller value (default 1M) and to change that you have to stop GPFS Best Konstantin On 03/15/2016 10:50 PM, Jan-Frode Myklebust wrote: > Not sure about cluster features, but at minimum you'll need to create > the filesystem with low enough mmcrfs --version string. > > > > > -jf > > tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic >: > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute > cluster. When looking at GPFS coexistance documents, it is not clear > whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know > if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 > cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ------------------------------------------------------------------------------------------- Konstantin Arnold | University of Basel & SIB Klingelbergstrasse 50/70 | CH-4056 Basel | Phone: +41 61 267 15 82 Email: konstantin.arnold at unibas.ch From Paul.Sanchez at deshaw.com Wed Mar 16 03:28:59 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 16 Mar 2016 03:28:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 13:08:51 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 13:08:51 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: > You do have to keep an eye out for filesystem version issues as you set > this up. If the new filesystem is created with a version higher than the > 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. > > > > You can specify the version of a new filesystem at creation time with, for > example, ?mmcrfs ?version 3.5.?. > > You can confirm an existing filesystem?s version with ?mmlsfs > | grep version?. > > > > There are probably a pile of caveats about features that you can never get > on the new filesystem though. If you don?t need high-bandwidth, parallel > access to the new filesystem from the 3.5 cluster, you could use CES or > CNFS for a time, until the 3.5 cluster is upgraded or retired. > > > > A possibly better recommendation would be to upgrade the 3.5 cluster to at > least 4.1, if not 4.2, instead. It would continue to be able to serve any > of your old version filesystems, but not prohibit you from moving forward > on the new ones. > > > > -Paul > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Oesterlin, Robert > *Sent:* Tuesday, March 15, 2016 4:45 PM > > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] cross-cluster mounting different versions > of gpfs > > > > I?ve never used ESS, but I state for a fact you can cross mount clusters > at various levels without a problem ? I do it all the time during upgrades. > I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may > be limited on 4.2 features when accessing it via the 3.5 cluster, but data > access should work fine. > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Damir > Krstic > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, March 15, 2016 at 3:31 PM > *To: *gpfsug main discussion list > *Subject: *[gpfsug-discuss] cross-cluster mounting different versions of > gpfs > > > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > > > Thanks, > > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Mar 16 13:29:42 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 14:29:42 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: <201603161329.u2GDTpjP006773@d06av09.portsmouth.uk.ibm.com> Hi, Damir, you cannot mount a 4.x fs level from a 3.5 level cluster / node. You need to create the fs with a sufficiently low level, fs level downgrade is not possible, AFAIK. 3.5 nodes can mount fs from 4.1 cluster (fs at 3.5.0.7 fs level), that I can confirm for sure. Uwe Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 02:09 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto: gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Damir Krstic Reply-To: gpfsug main discussion list Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Wed Mar 16 15:20:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:20:50 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYR3X029313@d03av02.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:03:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:03:27 +0000 Subject: [gpfsug-discuss] Perfileset df explanation Message-ID: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Wed Mar 16 16:05:48 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 16 Mar 2016 16:05:48 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:12:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:12:54 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A : > All, > > > > Can someone explain that this means? :: > > > > --filesetdf > > Displays a yes or no value indicating whether filesetdf is enabled; if > yes, the mmdf command reports numbers based on the quotas for the fileset > and not for the total file system. > > > > What this means, as in the output I would expect to see from mmdf with > this option set to Yes, and No? I don?t think it?s supposed to give any > indication of over-provision and cursory tests suggest it doesn?t. > > > > Thanks > > > > Richard > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:24:49 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:24:49 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Ah, I see, thanks for that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 16 March 2016 16:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A >: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Wed Mar 16 17:07:28 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:07:28 -0500 Subject: [gpfsug-discuss] 4.2 installer Message-ID: <56E992D0.3050603@genome.wustl.edu> All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_migratingtoISS4.2fromISS4.1.1.htm But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_configuringgpfs.htm%23configuringgpfs?lang=en Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:15:02 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:15:02 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E992D0.3050603@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> Message-ID: Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Mar 16 17:18:47 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 16 Mar 2016 18:18:47 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: > The key point is that you must create the file system so that is "looks" > like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a > test filesystem back on the 3.5 cluster and look at the version string. > mmslfs xxx -V. Then go to the 4.x system and try to create a file system > with the same version string.... > > > [image: Marc A Kaplan] > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Mar 16 17:20:11 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 17:20:11 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu>, Message-ID: Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 16 March 2016 17:15 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= From mweil at genome.wustl.edu Wed Mar 16 17:36:26 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:36:26 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <56E9999A.7030902@genome.wustl.edu> We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? On 3/16/16 12:20 PM, Simon Thompson (Research Computing - IT Services) wrote: > Does the installer manage to make the rpm kernel layer ok on clone oses? > > Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... > > (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] > Sent: 16 March 2016 17:15 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > Hi Matt > > I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > > > > From: > on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:07 PM > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] 4.2 installer > > All, > > Attempting to upgrade our into our dev environment. The update to 4.2 > was simple. > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= > > But I am confused on the installation toolkit. It seems that it is > going to set it all up and I just want to upgrade a cluster that is > already setup. Anyway to just pull in the current cluster info? > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= > > Thanks > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:36:37 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:36:37 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <2097A8FD-3A42-4D36-8DC2-1DDA6BC9984C@nuance.com> Sadly, it fails if the node can?t run mmbuildgpl, also on the clone OS?s of RedHat. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of "Simon Thompson (Research Computing - IT Services)" > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:20 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Mar 16 17:40:42 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:40:42 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9999A.7030902@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> Message-ID: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 18:07:59 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 18:07:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 18:47:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 18:47:06 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <56E9AA2A.3010108@buzzard.me.uk> On 16/03/16 18:07, Damir Krstic wrote: > Sven, > > For us, at least, at this point in time, we have to create new > filesystem with version flag. The reason is we can't take downtime to > upgrade all of our 500+ compute nodes that will cross-cluster mount this > new storage. We can take downtime in June and get all of the nodes up to > 4.2 gpfs version but we have users today that need to start using the > filesystem. > You can upgrade a GPFS file system piece meal. That is there should be no reason to take the whole system off-line to perform the upgrade. So you can upgrade a compute nodes to GPFS 4.2 one by one and they will happily continue to talk to the NSD's running 3.5 while the other nodes continue to use the file system. In a properly designed GPFS cluster you should also be able to take individual NSD nodes out for the upgrade. Though I wouldn't recommend running mixed versions on a long term basis, it is definitely fine for the purposes of upgrading. Then once all nodes in the GPFS cluster are upgraded you issue the mmchfs -V full. How long this will take will depend on the maximum run time you allow for your jobs. You would need to check that you can make a clean jump from 3.5 to 4.2 but IBM support should be able to confirm that for you. This is one of the nicer features of GPFS; its what I refer to as "proper enterprise big iron computing". That is if you have to take the service down at any time for any reason you are doing it wrong. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From UWEFALKE at de.ibm.com Wed Mar 16 18:51:59 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 19:51:59 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Wed Mar 16 19:06:02 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 19:06:02 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <56E9AA2A.3010108@buzzard.me.uk> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: Jonathan, Gradual upgrade is indeed a nice feature of GPFS. We are planning to gradually upgrade our clients to 4.2. However, before all, or even most clients are upgraded, we have to be able to mount this new 4.2 filesystem on all our compute nodes that are running version 3.5. Here is our environment today: storage cluster - 14 nsd servers * gpfs3.5 compute cluster - 500+ clients * gpfs3.5 <--- this cluster is mounting storage cluster filesystems new to us ESS cluster * gpfs4.2 ESS will become its own GPFS cluster and we want to mount its filesystems on our compute cluster. So far so good. We understand that we will eventually want to upgrade all our nodes in compute cluster to 4.2 and we know the upgrade path (3.5 --> 4.1 --> 4.2). The reason for this conversation is: with ESS and GPFS 4.2 can we remote mount it on our compute cluster? The answer we got is, yes if you build a new filesystem with --version flag. Sven, however, has just pointed out that this may not be desirable option since there are some features that are permanently lost when building a filesystem with --version. In our case, however, even though we will upgrade our clients to 4.2 (some gradually as pointed elsewhere in this conversation, and most in June), we have to be able to mount the new ESS filesystem on our compute cluster before the clients are upgraded. It seems like, even though Sven is recommending against it, building a filesystem with --version flag is our only option. I guess we have another option, and that is to upgrade all our clients first, but we can't do that until June so I guess it's really not an option at this time. I hope this makes our constraints clear: mainly, without being able to take downtime on our compute cluster, we are forced to build a filesystem on ESS using --version flag. Thanks, Damir On Wed, Mar 16, 2016 at 1:47 PM Jonathan Buzzard wrote: > On 16/03/16 18:07, Damir Krstic wrote: > > Sven, > > > > For us, at least, at this point in time, we have to create new > > filesystem with version flag. The reason is we can't take downtime to > > upgrade all of our 500+ compute nodes that will cross-cluster mount this > > new storage. We can take downtime in June and get all of the nodes up to > > 4.2 gpfs version but we have users today that need to start using the > > filesystem. > > > > You can upgrade a GPFS file system piece meal. That is there should be > no reason to take the whole system off-line to perform the upgrade. So > you can upgrade a compute nodes to GPFS 4.2 one by one and they will > happily continue to talk to the NSD's running 3.5 while the other nodes > continue to use the file system. > > In a properly designed GPFS cluster you should also be able to take > individual NSD nodes out for the upgrade. Though I wouldn't recommend > running mixed versions on a long term basis, it is definitely fine for > the purposes of upgrading. > > Then once all nodes in the GPFS cluster are upgraded you issue the > mmchfs -V full. How long this will take will depend on the maximum run > time you allow for your jobs. > > You would need to check that you can make a clean jump from 3.5 to 4.2 > but IBM support should be able to confirm that for you. > > This is one of the nicer features of GPFS; its what I refer to as > "proper enterprise big iron computing". That is if you have to take the > service down at any time for any reason you are doing it wrong. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:29:17 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:29:17 -0800 Subject: [gpfsug-discuss] cross-cluster mounting different versionsofgpfs In-Reply-To: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Message-ID: <201603161929.u2GJTRRf020013@d03av01.boulder.ibm.com> There are two related, but distinctly different issues to consider. 1) File system format and backward compatibility. The format of a given file system is recorded on disk, and determines the level of code required to mount such a file system. GPFS offers backward compatibility for older file system versions stretching for many releases. The oldest file system format we test with in the lab is 2.2 (we don't believe there are file systems using older versions actually present in the field). So if you have a file system formatted using GPFS V3.5 code, you can mount that file system using GPFS V4.1 or V4.2 without a problem. Of course, you don't get to use the new features that depend on the file system format that came out since V3.5. If you're formatting a new file system on a cluster running newer code, but want that file system to be mountable by older code, you have to use --version with mmcrfs. 2) RPC format compatibility, aka nodes being able to talk to each other. As the code evolves, the format of some RPCs sent over the network to other nodes naturally has to evolve as well. This of course presents a major problem for code coexistence (running different versions of GPFS on different nodes in the same cluster, or nodes from different clusters mounting the same file system, which effectively means joining a remote cluster), which directly translates into the possibility of a rolling migration (upgrading nodes to newer GPFS level one at a time, without taking all nodes down). Implementing new features while preserving some level of RPC compatibility with older releases is Hard, but this is something GPFS has committed to, long ago. The commitment is not open-ended though, there's a very specific statement of support for what's allowed. GPFS major (meaning 'v' or 'r' is incremented in a v.r.m.f version string) release N stream shall have coexistence with the GPFS major release N - 1 stream. So coexistence of V4.2 with V4.1 is supported, while coexistence of V4.2 with older releases is unsupported (it may or may not work if one tries it, depending on the specific combination of versions, but one would do so entirely on own risk). The reason for limiting the extent of RPC compatibility is prosaic: in order to support something, we have to be able to test this something. We have the resources to test the N / N - 1 combination, for every major release N. If we had to extend this to N, N - 1, N - 2, N - 3, you can do the math on how many combinations to test that would create. That would bust the test budget. So if you want to cross-mount a file system from a home cluster running V4.2, you have to run at least V4.1.x on client nodes, and the file system would have to be formatted using the lowest version used on any node mounting the file system. Hope this clarifies things a bit. yuri From: "Uwe Falke" To: gpfsug main discussion list , Date: 03/16/2016 11:52 AM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 16 19:37:31 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 14:37:31 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> Message-ID: <56E9B5FB.2050105@genome.wustl.edu> any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an > error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != > libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to > many package dependencies. Get those setup right and the installer > does a good job. > > If you go through and define your cluster nodes to the installer, you > can do a GPFS upgrade that way. I?ve run into some issues, especially > with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a > whole lot of control over what it does ? give it a ty and it may work > well for you. But run it in a test cluster first or on a limited set > of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: > on behalf of Matt > Weil > > Reply-To: gpfsug main discussion list > > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:37:53 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:37:53 -0800 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: <201603161937.u2GJbwII007184@d03av04.boulder.ibm.com> The 'mmdf' part of the usage string is actually an error, it should actually say 'df'. More specifically, this changes the semantics of statfs (2). On Linux, the statfs syscall takes a path argument, which can be the root directory of a file system, or a subdirectory inside. If the path happens to be a root directory of a fileset, and that fileset has the fileset quota set, and --filesetdf is set to 'yes', the statfs returns utilization numbers based on the fileset quota utilization, as opposed to the overall file system utilization. This is useful when a specific fileset is NFS-exported as a 'share', and it's desirable to see only the space used/available for that 'share' on the NFS client side. yuri From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" , Date: 03/16/2016 09:05 AM Subject: [gpfsug-discuss] Perfileset df explanation Sent by: gpfsug-discuss-bounces at spectrumscale.org All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 19:45:35 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 19:45:35 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: <56E9B7DF.1020007@buzzard.me.uk> On 16/03/16 19:06, Damir Krstic wrote: [SNIP] > > In our case, however, even though we will upgrade our clients to 4.2 > (some gradually as pointed elsewhere in this conversation, and most in > June), we have to be able to mount the new ESS filesystem on our compute > cluster before the clients are upgraded. What is preventing a gradual if not rapid upgrade of the compute clients now? The usual approach is once you have verified the upgrade is to simply to disable the queues on all the nodes and as jobs finish you upgrade them as they become free. Again because the usual approach is to have a maximum run time for jobs (that is jobs can't just run forever and will be culled if they run too long) you can achieve this piece meal upgrade in a relatively short period of time. Most places have a maximum run time of one to two weeks. So if you are within the norm this could be done by the end of the month. It's basically the same procedure as you would use to say push a security update that required a reboot. The really neat way is to script it up and then make it a job that you keep dumping in the queue till all nodes are updated :D > > It seems like, even though Sven is recommending against it, building a > filesystem with --version flag is our only option. I guess we have > another option, and that is to upgrade all our clients first, but we > can't do that until June so I guess it's really not an option at this time. > I would add my voice to that. The "this feature is not available because you created the file system as version x.y.z" is likely to cause you problems at some point down the line. Certainly caused me headaches in the past. > I hope this makes our constraints clear: mainly, without being able to > take downtime on our compute cluster, we are forced to build a > filesystem on ESS using --version flag. > Again there is or at least should not be *ANY* requirement for downtime of the compute cluster that the users will notice. Certainly nothing worse that nodes going down due to hardware failures or pushing urgent security patches. Taking a different tack is it not possible for the ESS storage to be added to the existing files system? That is you get a bunch of NSD's on the disk with NSD servers, add them all to the existing cluster and then issue some "mmchdisk suspend" on the existing disks followed by some "mmdeldisk " and have the whole lot move over to the new storage in an a manner utterly transparent to the end users (well other than a performance impact)? This approach certainly works (done it myself) but IBM might have placed restrictions on the ESS offering preventing you doing this while maintaining support that I am not familiar with. If there is I personally would see this a barrier to purchase of ESS but then I am old school when it comes to GPFS and not at all familiar with ESS. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Wed Mar 16 19:51:59 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 19:51:59 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9B5FB.2050105@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com>, <56E9B5FB.2050105@genome.wustl.edu> Message-ID: Have you got a half updated system maybe? You cant have: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 I.e. 0.7.3-5 and 0.7.5-4 I cant check right now, but are ibm shipping libcap-Ng as part of their package? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] Sent: 16 March 2016 19:37 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer any help here? ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 Error: Multilib version problems found. This often means that the root cause is something else and multilib version checking is just pointing out that there is a problem. Eg.: 1. You have an upgrade for libcap-ng which is missing some dependency that another package requires. Yum is trying to solve this by installing an older version of libcap-ng of the different architecture. If you exclude the bad architecture yum will tell you what the root cause is (which package requires what). You can try redoing the upgrade with --exclude libcap-ng.otherarch ... this should give you an error message showing the root cause of the problem. 2. You have multiple architectures of libcap-ng installed, but yum can only see an upgrade for one of those architectures. If you don't want/need both architectures anymore then you can remove the one with the missing update and everything will work. 3. You have duplicate versions of libcap-ng installed already. You can use "yum check" to get yum show these errors. ...you can also use --setopt=protected_multilib=false to remove this checking, however this is almost never the correct thing to do as something else is very likely to go wrong (often causing much more problems). Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From volobuev at us.ibm.com Wed Mar 16 20:03:09 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 12:03:09 -0800 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so > I never felt confident enough to deploy over the larger 4000+ clients > cluster. Back then, GPFS code had a deficiency: migrating very large files didn't work well with snapshots (and some operation mm commands). In order to create a snapshot, we have to have the file system in a consistent state for a moment, and we get there by performing a "quiesce" operation. This is done by flushing all dirty buffers to disk, stopping any new incoming file system operations at the gates, and waiting for all in-flight operations to finish. This works well when all in-flight operations actually finish reasonably quickly. That assumption was broken if an external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very large file, e.g. to migrate the file's blocks to a different storage pool. The quiesce operation would need to wait for that API call to finish, as it's an in-flight operation, but migrating a multi-TB file could take a while, and during this time all new file system ops would be blocked. This was solved several years ago by changing the API and its callers to do the migration one block range at a time, thus making each individual syscall short and allowing quiesce to barge in and do its thing. All currently supported levels of GPFS have this fix. I believe mmbackup was affected by the same GPFS deficiency and benefited from the same fix. yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Mar 16 20:20:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 16 Mar 2016 16:20:21 -0400 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> Message-ID: <20160316162021.57513mzxykk7semd@support.scinet.utoronto.ca> OK, that is good to know. I'll give it a try with snapshot then. We already have 3.5 almost everywhere, and planing for 4.2 upgrade (reading the posts with interest) Thanks Jaime Quoting Yuri L Volobuev : > >> Under both 3.2 and 3.3 mmbackup would always lock up our cluster when >> using snapshot. I never understood the behavior without snapshot, and >> the lock up was intermittent in the carved-out small test cluster, so >> I never felt confident enough to deploy over the larger 4000+ clients >> cluster. > > Back then, GPFS code had a deficiency: migrating very large files didn't > work well with snapshots (and some operation mm commands). In order to > create a snapshot, we have to have the file system in a consistent state > for a moment, and we get there by performing a "quiesce" operation. This > is done by flushing all dirty buffers to disk, stopping any new incoming > file system operations at the gates, and waiting for all in-flight > operations to finish. This works well when all in-flight operations > actually finish reasonably quickly. That assumption was broken if an > external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very > large file, e.g. to migrate the file's blocks to a different storage pool. > The quiesce operation would need to wait for that API call to finish, as > it's an in-flight operation, but migrating a multi-TB file could take a > while, and during this time all new file system ops would be blocked. This > was solved several years ago by changing the API and its callers to do the > migration one block range at a time, thus making each individual syscall > short and allowing quiesce to barge in and do its thing. All currently > supported levels of GPFS have this fix. I believe mmbackup was affected by > the same GPFS deficiency and benefited from the same fix. > > yuri > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From duersch at us.ibm.com Wed Mar 16 20:25:23 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 16 Mar 2016 16:25:23 -0400 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Please see question 2.10 in our faq. http://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/STXKQY/gpfsclustersfaq.pdf We only support clusters that are running release n and release n-1 and release n+1. So 4.1 is supported to work with 3.5 and 4.2. Release 4.2 is supported to work with 4.1, but not with gpfs 3.5. It may indeed work, but it is not supported. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York >>Message: 1 >>Date: Wed, 16 Mar 2016 18:07:59 +0000 >>From: Damir Krstic >>To: gpfsug main discussion list >>Subject: Re: [gpfsug-discuss] cross-cluster mounting different >> versions of gpfs >>Message-ID: >> >>Content-Type: text/plain; charset="utf-8" >> >>Sven, >> >>For us, at least, at this point in time, we have to create new filesystem >>with version flag. The reason is we can't take downtime to upgrade all of >>our 500+ compute nodes that will cross-cluster mount this new storage. We >>can take downtime in June and get all of the nodes up to 4.2 gpfs version >>but we have users today that need to start using the filesystem. >> >>So at this point in time, we either have ESS built with 4.1 version and >>cross mount its filesystem (also built with --version flag I assume) to our >>3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems >>with --version flag and then in June when we get all of our clients upgrade >>we run =latest gpfs command and then mmchfs -V to get filesystem back up to >>4.2 features. >> >>It's unfortunate that we are in this bind with the downtime of the compute >>cluster. If we were allowed to upgrade our compute nodes before June, we >>could proceed with 4.2 build without having to worry about filesystem >>versions. >> >>Thanks for your reply. >> >>Damir From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/16/2016 02:08 PM Subject: gpfsug-discuss Digest, Vol 50, Issue 47 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: cross-cluster mounting different versions of gpfs (Damir Krstic) ---------------------------------------------------------------------- Message: 1 Date: Wed, 16 Mar 2016 18:07:59 +0000 From: Damir Krstic To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: Content-Type: text/plain; charset="utf-8" Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 47 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 21:52:34 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 16:52:34 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com><56E9AA2A.3010108@buzzard.me.uk> Message-ID: <201603162152.u2GLqfvD032745@d03av03.boulder.ibm.com> Considering the last few appends from Yuri and Sven, it seems you might want to (re)consider using Samba and/or NFS... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Thu Mar 17 11:14:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 17 Mar 2016 11:14:03 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: (Sorry, just found this in drafts, thought I'd sent it yesterday!) Cheers Luke. Sorry, I wasn't actually wanting to get over-provisioning stats (although it would be great!) just that I thought that might be what it does. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Luke Raimbach Sent: 16 March 2016 16:06 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Mar 17 16:03:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 17 Mar 2016 16:03:59 +0000 Subject: [gpfsug-discuss] Experiences with Alluxio/Tachyon ? Message-ID: <18C8D317-16BE-4351-AD8D-0E165FB60511@nuance.com> Anyone have experience with Alluxio? http://www.alluxio.org/ Also http://ibmresearchnews.blogspot.com/2015/08/tachyon-for-ultra-fast-big-data.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 18 16:39:42 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:39:42 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> Message-ID: <56EC2F4E.6010203@genome.wustl.edu> upgrading to 4.2.2 fixed the dependency issue. I now get Unable to access CES shared root. # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' cesSharedRoot /vol/system On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: > Have you got a half updated system maybe? > > You cant have: > libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > I.e. 0.7.3-5 and 0.7.5-4 > > I cant check right now, but are ibm shipping libcap-Ng as part of their package? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] > Sent: 16 March 2016 19:37 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > > On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. > > If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From mweil at genome.wustl.edu Fri Mar 18 16:54:51 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:54:51 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56EC2F4E.6010203@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> <56EC2F4E.6010203@genome.wustl.edu> Message-ID: <56EC32DB.1000108@genome.wustl.edu> Fri Mar 18 11:50:43 CDT 2016: mmcesop: /vol/system/ found but is not on a GPFS filesystem On 3/18/16 11:39 AM, Matt Weil wrote: > upgrading to 4.2.2 fixed the dependency issue. I now get Unable to > access CES shared root. > > # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' > cesSharedRoot /vol/system > > On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: >> Have you got a half updated system maybe? >> >> You cant have: >> libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> I.e. 0.7.3-5 and 0.7.5-4 >> >> I cant check right now, but are ibm shipping libcap-Ng as part of their package? >> >> Simon >> ________________________________________ >> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] >> Sent: 16 March 2016 19:37 >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> any help here? >> ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 >> Error: Multilib version problems found. This often means that the root >> cause is something else and multilib version checking is just >> pointing out that there is a problem. Eg.: >> >> 1. You have an upgrade for libcap-ng which is missing some >> dependency that another package requires. Yum is trying to >> solve this by installing an older version of libcap-ng of the >> different architecture. If you exclude the bad architecture >> yum will tell you what the root cause is (which package >> requires what). You can try redoing the upgrade with >> --exclude libcap-ng.otherarch ... this should give you an error >> message showing the root cause of the problem. >> >> 2. You have multiple architectures of libcap-ng installed, but >> yum can only see an upgrade for one of those architectures. >> If you don't want/need both architectures anymore then you >> can remove the one with the missing update and everything >> will work. >> >> 3. You have duplicate versions of libcap-ng installed already. >> You can use "yum check" to get yum show these errors. >> >> ...you can also use --setopt=protected_multilib=false to remove >> this checking, however this is almost never the correct thing to >> do as something else is very likely to go wrong (often causing >> much more problems). >> >> Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> >> On 3/16/16 12:40 PM, Oesterlin, Robert wrote: >> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. >> >> If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. >> >> Bob Oesterlin >> Sr Storage Engineer, Nuance HPC Grid >> 507-269-0413 >> >> >> From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > >> Reply-To: gpfsug main discussion list > >> Date: Wednesday, March 16, 2016 at 12:36 PM >> To: "gpfsug-discuss at spectrumscale.org" > >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> We have multiple clusters with thousands of nsd's surely there is an >> upgrade path. Are you all saying just continue to manually update nsd >> servers and manage them as we did previously. Is the installer not >> needed if there are current setups. Just deploy CES manually? >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From martin.gasthuber at desy.de Tue Mar 22 09:45:30 2016 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 22 Mar 2016 10:45:30 +0100 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server Message-ID: Hi, we're looking for a powerful (and cost efficient) machine config to optimally support the new CES services, especially Ganesha. In more detail, we're wondering if somebody has already got some experience running these services on machines with HAWC and/or LROC enabled HW, resulting in a clearer understanding of the benefits of that config. We will have ~300 client boxes accessing GPFS via NFS and planning for 2 nodes initially. best regards, Martin From S.J.Thompson at bham.ac.uk Tue Mar 22 10:05:05 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 22 Mar 2016 10:05:05 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Tue Mar 22 12:44:57 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Tue, 22 Mar 2016 12:44:57 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: <4eec1651b22f40418104a5a44f424b8d@mbxtoa1.winmail.deshaw.com> It's worth sharing that we have seen two problems with CES providing NFS via ganesha in a similar deployment: 1. multicluster cache invalidation: ganesha's FSAL upcall for invalidation of its file descriptor cache by GPFS doesn't appear to work for remote GPFS filesystems. As mentioned by Simon, this is unsupported, though the problem can be worked around with some effort though by disabling ganesha's FD cache entirely. 2. Readdir bad cookie bug: an interaction we're still providing info to IBM about between certain linux NFS clients and ganesha in which readdir calls may sporadically return empty results for directories containing files, without any corresponding error result code. Given our multicluster requirements and the problems associated with the readdir bug, we've reverted to using CNFS for now. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, March 22, 2016 6:05 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC/LROC in Ganesha server Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" > wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Wed Mar 23 11:31:45 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 23 Mar 2016 11:31:45 +0000 Subject: [gpfsug-discuss] Places are filling up fast! Message-ID: <50eb8657d660d1c8d7714a14b6d69864@webmail.gpfsug.org> Dear members, We've had a fantastic response to the registrations for the next meeting in May. So good in fact that there are only 22 spaces left! If you are thinking of attending I would recommend doing so as soon as you can to avoid missing out. The link to register is: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 [1] Also, we really like to hear from members on their experiences and are looking for volunteers for a short 15-20 minute presentation on their Spectrum Scale/GPFS installation, the highs and lows of it! If you're interested, please let Simon (chair at spectrumscaleug.org) or I know. Thanks and we look forward to seeing you in May. Claire -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Links: ------ [1] http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.finnerman at load.se Tue Mar 29 23:04:26 2016 From: jan.finnerman at load.se (Jan Finnerman Load) Date: Tue, 29 Mar 2016 22:04:26 +0000 Subject: [gpfsug-discuss] Joined GPFS alias Message-ID: Hi All, I just joined the alias and want to give this short introduction of myself in GPFS terms. I work as a consultant at Load System, an IBM Business Partner based in Sweden. We work mainly in the Media and Finance markets. I support and do installs of GPFS at two customers in the media market in Sweden. Currently, I?m involved in a new customer install with Spectrum Scale 4.2/Red Hat 7.1/PowerKVM/Power 8. This is a customer in south of Sweden that do scientific research in Physics on Elementary Particles. My office location is Kista outside of Stockholm in Sweden. Brgds ///Jan [cid:7674672D-7E3F-417F-96F9-89737A1F6AEE] Jan Finnerman Senior Technical consultant [CertTiv_sm] [cid:4D49557E-099B-4799-AD7E-0A103EB45735] Kista Science Tower 164 51 Kista Mobil: +46 (0)70 631 66 26 Kontor: +46 (0)8 633 66 00/26 jan.finnerman at load.se -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png Type: image/png Size: 5565 bytes Desc: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png Type: image/png Size: 8584 bytes Desc: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CertPowerSystems_sm[1][9].png Type: image/png Size: 6664 bytes Desc: CertPowerSystems_sm[1][9].png URL: From Luke.Raimbach at crick.ac.uk Tue Mar 1 12:43:54 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 1 Mar 2016 12:43:54 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS Message-ID: HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 16:22:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 16:22:35 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Mar 2 16:27:24 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:27:24 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From S.J.Thompson at bham.ac.uk Wed Mar 2 16:29:34 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:29:34 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Message-ID: I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon From oehmes at us.ibm.com Wed Mar 2 16:31:12 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 2 Mar 2016 08:31:12 -0800 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: <201603021631.u22GVTh9003605@d03av04.boulder.ibm.com> its direct SAS attached . ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 03/02/2016 08:27 AM Subject: Re: [gpfsug-discuss] IBM-Sandisk Announcement Sent by: gpfsug-discuss-bounces at spectrumscale.org There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 17:04:57 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 17:04:57 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: <37CDF3CF-53AD-45FC-8E0C-582CED5DD99F@nuance.com> The reason I?m asking is that I?m doing a test with an IF100 box, and wanted to know what the IBM plans were for it :-) Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVqeu026048@d03av04.boulder.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVuTl015056@d01av01.pok.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 2 19:46:48 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:46:48 -0600 Subject: [gpfsug-discuss] cpu shielding Message-ID: <56D74328.50507@genome.wustl.edu> All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From bbanister at jumptrading.com Wed Mar 2 19:49:50 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 19:49:50 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D74328.50507@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil Sent: Wednesday, March 02, 2016 1:47 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cpu shielding All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 19:54:21 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:54:21 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D744ED.30307@genome.wustl.edu> Can you share anything more? We are trying all system related items on cpu0 GPFS is on cpu1 and the rest are used for the lsf scheduler. With that setup we still see evictions. Thanks Matt On 3/2/16 1:49 PM, Bryan Banister wrote: > We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. > -B > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil > Sent: Wednesday, March 02, 2016 1:47 PM > To: gpfsug main discussion list > Subject: [gpfsug-discuss] cpu shielding > > All, > > We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? > > Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. > > Thanks > > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From viccornell at gmail.com Wed Mar 2 20:15:16 2016 From: viccornell at gmail.com (viccornell at gmail.com) Date: Wed, 2 Mar 2016 21:15:16 +0100 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D744ED.30307@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Wed Mar 2 20:17:38 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 20:17:38 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 20:22:05 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:22:05 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <56D74B6D.8050802@genome.wustl.edu> On 3/2/16 2:15 PM, viccornell at gmail.com wrote: > Hi, > > How sure are you that it is cpu scheduling that is your problem? just spotted this maybe it can help spot something. https://software.intel.com/en-us/articles/intel-performance-counter-monitor > > Are you using IB or Ethernet? two 10 gig Intel nics in a LACP bond. links are not saturated. > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From S.J.Thompson at bham.ac.uk Wed Mar 2 20:24:44 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 20:24:44 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> , <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] Sent: 02 March 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at genome.wustl.edu Wed Mar 2 20:47:24 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:47:24 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D7515C.4070102@genome.wustl.edu> GPFS client version 3.5.0-15 any related issues there with timeouts? On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Greg.Lehmann at csiro.au Wed Mar 2 22:48:51 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 2 Mar 2016 22:48:51 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <304dd806ce6e4488b163676bb5889da2@exch2-mel.nexus.csiro.au> Sitting next to 2 DDN guys doing some gridscaler training. Their opinion is "pure FUD". They are happy for us to run IBM or their Spectrum Scale packages in the DDN hardware. Cheers, Greg -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Thursday, 3 March 2016 2:30 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS vs Spectrum Scale I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From daniel.kidger at uk.ibm.com Wed Mar 2 22:52:55 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 2 Mar 2016 22:52:55 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <201603022153.u22Lr0nY015961@d06av10.portsmouth.uk.ibm.com> I work for IBM and in particular support OEMs and other Business Partners I am not sure if Simon is using try true IBM speak here as any OEM purchase of Spectrum Scale inherently has tin included, be it from DDN, Seagate, Lenovo, etc. Remember there are 4 main ways to buy Spectrum Scale: 1. as pure software, direct from IBM or though a business partner. 2. as part of a hardware offering from an OEM 3. as part of a hardware offering from IBM. This is what ESS is. 4. as a cloud service in Softlayer. Spectrum Scale (GPFS) is exactly the same software no matter which route above is used to purchase it. What OEMs do do, as IBM do with their ESS appliance product is do extra validation to confirm that the newest release is fully compatible with their hardware solution and has no regressions in performance or otherwise. Hence there is often perhaps 3 months between say the 4.2 official release and when it appears in OEM solutions. ESS is the same here. The two difference to note that make #2 OEM systems different are though are: 1: When bought as part of an OEM through say Lenovo, DDN or Seagate then that OEM owns the actual GFPS licenses rather than the end customer. The practical side of this is that if you later replace the hardware with a different vendors hardware there is no automatic right to transfer over the old licenses, as would be the case if GPFS was bought directly from IBM/ 2. When bought as part of an OEM system, then that OEM is the sole point of contact for the customer for all support. The customer does not first have to triage if it is a hw or sw issue. The OEM in return provides 1st and 2nd line support to the customer, and only escalates in-depth level 3 support issues to IBM's development team. The OEMs then will have gone though extensive training to be able to do such 1st and 2nd line support. (Of course many traditional IBM Business Partners are also very clued up about helping their customers directly.) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 02/03/2016 16:30 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Sent by: gpfsug-discuss-bounces at spectrumscale.org I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZNwQ032425@d03av04.boulder.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZMCU018632@d01av01.pok.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Thu Mar 3 09:07:25 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 3 Mar 2016 09:07:25 +0000 Subject: [gpfsug-discuss] Cloning across fileset boundaries Message-ID: Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDobP010703@d03av04.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDplG002884@d03av01.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Mar 3 21:57:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 3 Mar 2016 21:57:20 +0000 Subject: [gpfsug-discuss] Small cluster Message-ID: I have a client that wants to build small remote sites to sync back to an ESS cluster they purchased. These remote sites are generally <15-20TB. If I build a three node cluster with just internal drives can this work if the drives aren?t shared amongst the cluster without FPO or GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with the three nodes? Assuming all are NSD servers (or two at least). Seems like most of the implementations I?m seeing use shared disks so local drives only would be an odd architecture right? What do I give up by not having shared disks seen by other NSD servers? Mark Bush Storage Architect This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Mar 3 22:23:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 3 Mar 2016 22:23:08 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: Message-ID: <56D8B94C.2000303@buzzard.me.uk> On 03/03/16 21:57, Mark.Bush at siriuscom.com wrote: > I have a client that wants to build small remote sites to sync back to > an ESS cluster they purchased. These remote sites are generally > <15-20TB. If I build a three node cluster with just internal drives can > this work if the drives aren?t shared amongst the cluster without FPO or > GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with > the three nodes? Assuming all are NSD servers (or two at least). Seems > like most of the implementations I?m seeing use shared disks so local > drives only would be an odd architecture right? What do I give up by > not having shared disks seen by other NSD servers? > Unless you are doing data and metadata replication on the remote sites then any one server going down is not good at all. To be honest I have only ever seen that sort of setup done once. It was part of a high availability web server system. The idea was GPFS provided the shared storage between the nodes by replicating everything. Suffice as to say keeping things polite "don't do that". In reality the swear words coming from the admin trying to get GPFS fixed when disks failed where a lot more colourful. In the end the system was abandoned and migrated to ESX as it was back then. Mind you that was in the days of GPFS 2.3 so it *might* be better now; are you feeling lucky? However a SAS attached Dell MD3 (it's LSI/Netgear Engenio storage so basically the same as a DS3000/4000/5000) is frankly so cheap that it's just not worth going down that route if you ask me. I would do a two server cluster with a tie breaker disk on the MD3 to avoid any split brain issues, and use the saving on the third server to buy the MD3 and SAS cards. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From makaplan at us.ibm.com Fri Mar 4 16:09:03 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 4 Mar 2016 11:09:03 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <56D8B94C.2000303@buzzard.me.uk> References: <56D8B94C.2000303@buzzard.me.uk> Message-ID: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri Mar 4 16:21:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:21:20 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:26:15 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:26:15 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > I guess this is really my question. Budget is less than $50k per site and > they need around 20TB storage. Two nodes with MD3 or something may work. > But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > From: on behalf of Marc A > Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution more > difficult. > > To begin with, as with any provisioning problem, one should try to state: > requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 4 16:28:52 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 04 Mar 2016 16:28:52 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <1457108932.4251.183.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-04 at 11:09 -0500, Marc A Kaplan wrote: > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the > GPFS 2.3 era. Once bitten twice shy. I was offering my experience of that setup, which is not good. I my defense I did note it was it the 2.x era and it might be better now. > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. The other thing I would point out is that replacing a disk in a MD3 or similar is an operator level procedure. Replacing a similar disk up the front with GPFS replication requires a skilled GPFS administrator. Given these are to be on remote sites, I would suspect simpler lower skilled maintenance is better. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Fri Mar 4 16:30:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:30:41 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:36:30 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:36:30 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > Yes. Really the only other option we have (and not a bad one) is getting > a v7000 Unified in there (if we can get the price down far enough). That?s > not a bad option since all they really want is SMB shares in the remote. I > just keep thinking a set of servers would do the trick and be cheaper. > > > > From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM > > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks below > the GPFS filesystem layer and looks like a normal GPFS system (mostly). I > do think there were some restrictions on non-FPO nodes mounting FPO > filesystems via multi-cluster.. not sure if those are still there.. any > input on that from IBM? > > If small enough data, and with 3-way replication, it might just be wise to > do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common > throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between 'science > experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < > Mark.Bush at siriuscom.com> wrote: > >> I guess this is really my question. Budget is less than $50k per site >> and they need around 20TB storage. Two nodes with MD3 or something may >> work. But could it work (and be successful) with just servers and internal >> drives? Should I do FPO for non hadoop like workloads? I didn?t think I >> could get native raid except in the ESS (GSS no longer exists if I remember >> correctly). Do I just make replicas and call it good? >> >> >> Mark >> >> From: on behalf of Marc A >> Kaplan >> Reply-To: gpfsug main discussion list >> Date: Friday, March 4, 2016 at 10:09 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Small cluster >> >> Jon, I don't doubt your experience, but it's not quite fair or even >> sensible to make a decision today based on what was available in the GPFS >> 2.3 era. >> >> We are now at GPFS 4.2 with support for 3 way replication and FPO. >> Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS >> solutions and more. >> >> So more choices, more options, making finding an "optimal" solution more >> difficult. >> >> To begin with, as with any provisioning problem, one should try to state: >> requirements, goals, budgets, constraints, failure/tolerance >> models/assumptions, >> expected workloads, desired performance, etc, etc. >> >> >> This message (including any attachments) is intended only for the use of >> the individual or entity to which it is addressed and may contain >> information that is non-public, proprietary, privileged, confidential, and >> exempt from disclosure under applicable law. If you are not the intended >> recipient, you are hereby notified that any use, dissemination, >> distribution, or copying of this communication is strictly prohibited. This >> message may be viewed by parties at Sirius Computer Solutions other than >> those named in the message header. This message does not contain an >> official representation of Sirius Computer Solutions. If you have received >> this communication in error, notify Sirius Computer Solutions immediately >> and (i) destroy this message if a facsimile or (ii) delete this message >> immediately if this is an electronic communication. Thank you. >> Sirius Computer Solutions >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > Zach Giles > zgiles at gmail.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 4 16:40:54 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 4 Mar 2016 10:40:54 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D9BA96.8010800@genome.wustl.edu> All, This turned out to be processes copying data from GPFS to local /tmp. Once the system memory was full it started blocking while the data was being flushed to disk. This process was taking long enough to have leases expire. Matt On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Paul.Sanchez at deshaw.com Fri Mar 4 16:54:39 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 4 Mar 2016 16:54:39 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: You wouldn?t be alone in trying to make the ?concurrent CES gateway + NSD server nodes? formula work. That doesn?t mean it will be well-supported initially, but it does mean that others will be finding bugs and interaction issues along with you. On GPFS 4.1.1.2 for example, it?s possible to get a CES protocol node into a state where the mmcesmonitor is dead and requires a mmshutdown/mmstartup to recover from. Since in a shared-nothing disk topology that would require mmchdisk/mmrestripefs to recover and rebalance, it would be operationally intensive to run CES on an NSD server with local disks. With shared SAN disks, this becomes more tractable, in my opinion. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Zachary Giles Sent: Friday, March 04, 2016 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com > wrote: Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Mar 4 18:03:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 4 Mar 2016 19:03:16 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From ceason at us.ibm.com Fri Mar 4 18:20:50 2016 From: ceason at us.ibm.com (Jeffrey M Ceason) Date: Fri, 4 Mar 2016 11:20:50 -0700 Subject: [gpfsug-discuss] Small cluster (Jeff Ceason) In-Reply-To: References: Message-ID: <201603041821.u24IL6S6000328@d01av02.pok.ibm.com> The V7000 Unified type system is made for this application. http://www-03.ibm.com/systems/storage/disk/storwize_v7000/ Jeff Ceason Solutions Architect (520) 268-2193 (Mobile) ceason at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/04/2016 11:15 AM Subject: gpfsug-discuss Digest, Vol 50, Issue 14 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Small cluster (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 4 Mar 2016 19:03:16 +0100 From: "Sven Oehme" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Message-ID: <201603041804.u24I4g2R026689 at d03av01.boulder.ibm.com> Content-Type: text/plain; charset="utf-8" Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 14 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From janfrode at tanso.net Sat Mar 5 13:16:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 05 Mar 2016 13:16:54 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : > Hi, > > a couple of comments to the various infos in this thread. > > 1. the need to run CES on separate nodes is a recommendation, not a > requirement and the recommendation comes from the fact that if you have > heavy loaded NAS traffic that gets the system to its knees, you can take > your NSD service down with you if its on the same box. so as long as you > have a reasonable performance expectation and size the system correct there > is no issue. > > 2. shared vs FPO vs shared nothing (just replication) . the main issue > people overlook in this scenario is the absence of read/write caches in FPO > or shared nothing configurations. every physical disk drive can only do > ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its > pretty much the same effort. particular on metadata this bites you really > badly as every of this tiny i/os eats one of your 100 iops a disk can do > and quickly you used up all your iops on the drives. if you have any form > of raid controller (sw or hw) it typically implements at minimum a read > cache on most systems a read/write cache which will significant increase > the number of logical i/os one can do against a disk , my best example is > always if you have a workload that does 4k seq DIO writes to a single disk, > if you have no raid controller you can do 400k/sec in this workload if you > have a reasonable ok write cache in front of the cache you can do 50 times > that much. so especilly if you use snapshots, CES services or anything > thats metadata intensive you want some type of raid protection with > caching. btw. replication in the FS makes this even worse as now each write > turns into 3 iops for the data + additional iops for the log records so you > eat up your iops very quick . > > 3. instead of shared SAN a shared SAS device is significantly cheaper but > only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 > nodes as you can use the disks as tiebreaker disks. if you also add some > SSD's for the metadata and make use of HAWC and LROC you might get away > from not needing a raid controller with cache as HAWC will solve that issue > for you . > > just a few thoughts :-D > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > [image: Inactive hide details for Zachary Giles ---03/04/2016 05:36:50 > PM---SMB too, eh? See this is where it starts to get hard to sca]Zachary > Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts > to get hard to scale down. You could do a 3 node GPFS clust > > From: Zachary Giles > > > To: gpfsug main discussion list > > Date: 03/04/2016 05:36 PM > > > Subject: Re: [gpfsug-discuss] Small cluster > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > SMB too, eh? See this is where it starts to get hard to scale down. You > could do a 3 node GPFS cluster with replication at remote sites, pulling in > from AFM over the Net. If you want SMB too, you're probably going to need > another pair of servers to act as the Protocol Servers on top of the 3 GPFS > servers. I think running them all together is not recommended, and probably > I'd agree with that. > Though, you could do it anyway. If it's for read-only and updated daily, > eh, who cares. Again, depends on your GPFS experience and the balance > between production, price, and performance :) > > On Fri, Mar 4, 2016 at 11:30 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > > Yes. Really the only other option we have (and not a bad one) is > getting a v7000 Unified in there (if we can get the price down far > enough). That?s not a bad option since all they really want is SMB shares > in the remote. I just keep thinking a set of servers would do the trick > and be cheaper. > > > > *From: *Zachary Giles <*zgiles at gmail.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:26 AM > > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks > below the GPFS filesystem layer and looks like a normal GPFS system > (mostly). I do think there were some restrictions on non-FPO nodes > mounting FPO filesystems via multi-cluster.. not sure if those are still > there.. any input on that from IBM? > > If small enough data, and with 3-way replication, it might just be > wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just > common throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between > 'science experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > I guess this is really my question. Budget is less than $50k per site > and they need around 20TB storage. Two nodes with MD3 or something may > work. But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > *From: *<*gpfsug-discuss-bounces at spectrumscale.org* > > on behalf of Marc A Kaplan > <*makaplan at us.ibm.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:09 AM > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. > > To begin with, as with any provisioning problem, one should try to > state: requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > > *Sirius Computer Solutions* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Sat Mar 5 13:31:40 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sat, 5 Mar 2016 14:31:40 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> as i stated in my previous post , its a recommendation so people don't overload the NSD servers to have them become non responsive or even forced rebooted (e.g. when you configure cNFS auto reboot on same node), it doesn't mean it doesn't work or is not supported. if all you are using this cluster for is NAS services, then this recommendation makes even less sense as the whole purpose on why the recommendation is there to begin with is that if NFS would overload a node that also serves as NSD server for other nodes it would impact the other nodes that use the NSD protocol, but if there are no NSD clients there is nothing to protect because if NFS is down all clients are not able to access data, even if your NSD servers are perfectly healthy... if you have a fairly large system with many NSD Servers, many clients as well as NAS clients this recommendation is correct, but not in the scenario you described below. i will work with the team to come up with a better wording for this in the FAQ. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jan-Frode Myklebust To: gpfsug main discussion list Cc: Sven Oehme/Almaden/IBM at IBMUS Date: 03/05/2016 02:17 PM Subject: Re: [gpfsug-discuss] Small cluster Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Zachary Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS clust From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough).? That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly).? I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Sven Oehme/Almaden/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sat Mar 5 18:40:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 5 Mar 2016 13:40:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> Message-ID: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Sat Mar 5 18:52:16 2016 From: zgiles at gmail.com (Zachary Giles) Date: Sat, 5 Mar 2016 13:52:16 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: > Indeed it seems to just add overhead and expense to split what can be done > by one node over two nodes! > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Sun Mar 6 13:55:59 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sun, 6 Mar 2016 14:55:59 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603051331.u25DVjvV017738@d01av01.pok.ibm.com><201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: <201603061356.u26Du4Zj014555@d03av05.boulder.ibm.com> the question is what difference does it make ? as i mentioned if all your 2 or 3 nodes do is serving NFS it doesn't matter if the protocol nodes or the NSD services are down in both cases it means no access to data which it makes no sense to separate them in this case (unless load dependent). i haven't seen nodes reboot specifically because of protocol issues lately, the fact that everything is in userspace makes things easier too. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/06/2016 02:31 AM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sun Mar 6 20:27:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 6 Mar 2016 15:27:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From janfrode at tanso.net Mon Mar 7 06:11:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 07 Mar 2016 06:11:27 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you really > need the protection/isolation. But I expect you are limiting the potential > performance of the overall system, compared to running a single Unix image > with multiple processes that can share resource and communicate more freely. > > > [image: Marc A Kaplan] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From volobuev at us.ibm.com Mon Mar 7 20:58:37 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Mon, 7 Mar 2016 12:58:37 -0800 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri From: Jan-Frode Myklebust To: gpfsug main discussion list , Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : As Sven wrote, the FAQ does not "prevent" anything.? It's just a recommendation someone came up with.? Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation.? But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Mar 7 21:10:48 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Mon, 7 Mar 2016 21:10:48 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this conversation. Thank you all for your responses. This is a great forum filled with very knowledgeable folks. Mark From: > on behalf of Yuri L Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri [Inactive hide details for Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within whatever is recommended. From: Jan-Frode Myklebust > To: gpfsug main discussion list >, Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan >: As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@][cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@]_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: 0B132319.gif URL: From r.sobey at imperial.ac.uk Tue Mar 8 09:48:01 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 09:48:01 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Message-ID: Morning all, I tried to download the VM to evaluate SS yesterday - more of a chance to play around with commands in a non-prod environment and look at what's in store. We're currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who's already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Mar 8 13:09:21 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 8 Mar 2016 13:09:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: References: Message-ID: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Mar 8 13:16:37 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 13:16:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> References: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Message-ID: Hah, well now you?ve got me checking just to make sure ? Ok, definitely says United Kingdom. Now it won?t let me download it at all, says page not found. Will persevere! Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 08 March 2016 13:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Eval VM download query Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel ________________________________ Dr.Daniel Kidger No. 1 The Square, [cid:image001.gif at 01D1793C.BE2DB440] Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com ________________________________ From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 360 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Tue Mar 8 15:53:34 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 8 Mar 2016 15:53:34 +0000 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Message-ID: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 16:36:46 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:36:46 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <20160308113646.54314ikzhtedrjby@support.scinet.utoronto.ca> I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Tue Mar 8 16:54:45 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:54:45 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <20160308115445.10061uekt4pp5kgl@support.scinet.utoronto.ca> For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Tue Mar 8 17:45:42 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:45:42 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Hi, please have a look at this document: http://www-01.ibm.com/support/docview.wss?uid=swg27018848 It describe the how-to setup and provides some hints and tips for migration policies. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Tue Mar 8 17:46:11 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:46:11 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <201603081646.u28GkIXt026930@d06av10.portsmouth.uk.ibm.com> Hi, in all cases a recall request will be handled transparent for the user at the time a migrated files is accessed. This can't be prevented and has two down sides: a) the space used in the file system increases and b) random access to storage media in the Spectrum Protect server happens. With newer versions of Spectrum Protect for Space Management a so called tape optimized recall method is available that can reduce the impact to the system (especially Spectrum Protect server). If the problem was that the file system went out of space at the time the recalls came in I would recommend to reduce the threshold settings for the file system and increase the number of premigrated files. This will allow to free space very quickly if needed. If you didn't use the policy based threshold migration so far I recommend to use it. This method is significant faster compared to the classical HSM based threshold migration approach. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 ----- From: Jaime Pinto To: gpfsug main discussion list Date: 08.03.2016 17:36 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Sent by: gpfsug-discuss-bounces at spectrumscale.org I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Tue Mar 8 18:58:29 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 18:58:29 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: My fantasy solution is 2 servers and a SAS disk shelf from my adopted, cheap x86 vendor running IBM Spectrum Scale with GNR as software only, doing concurrent, supported GNR and CES with maybe an advisory on the performance requirements of such and suggestions on scale out approaches :) Cheers Chris On 7 March 2016 at 21:10, Mark.Bush at siriuscom.com wrote: > Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this > conversation. Thank you all for your responses. This is a great forum > filled with very knowledgeable folks. > > Mark > > From: on behalf of Yuri L > Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > This use case is a good example of how it's hard to optimize across > multiple criteria. > > If you want a pre-packaged solution that's proven and easy to manage, > StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for > your requirements as such things get. Price may be an issue though, as > usual. > > If you're OK with rolling your own complex solution, my recommendation > would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external > disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via > the local block device interface. This avoids the pitfalls of data/metadata > replication, and offers a decent blend of performance, fault tolerance, and > disk management. You can use disk-based quorum if going with 2 nodes, or > traditional node majority quorum if using 3 nodes, either way would work. > There's no need to do any separation of roles (CES, quorum, managers, etc), > provided the nodes are adequately provisioned with memory and aren't > routinely overloaded, in which case you just need to add more nodes instead > of partitioning what you have. > > Using internal disks and relying on GPFS data/metadata replication, with > or without FPO, would mean taking the hard road. You may be able to spend > the least on hardware in such a config (although the 33% disk utilization > rate for triplication makes this less clear, if capacity is an issue), but > the operational challenges are going to be substantial. This would be a > viable config, but there are unavoidable tradeoffs caused by replication: > (1) writes are very expensive, which limits the overall cluster capability > for non-read-only workloads, (2) node and disk failures require a round of > re-replication, or "re-protection", which takes time and bandwidth, > limiting the overall capability further, (3) disk management can be a > challenge, as there's no software/hardware component to assist with > identifying failing/failed disks. As far as not going off the beaten path, > this is not it... Exporting protocols from a small triplicated file system > is not a typical mode of deployment of Spectrum Scale, you'd be blazing > some new trails. > > As stated already in several responses, there's no hard requirement that > CES Protocol nodes must be entirely separate from any other roles in the > general Spectrum Scale deployment scenario. IBM expressly disallows > co-locating Protocol nodes with ESS servers, due to resource consumption > complications, but for non-ESS cases it's merely a recommendation to run > Protocols on nodes that are not otherwise encumbered by having to provide > other services. Of course, the config that's the best for performance is > not the cheapest. CES doesn't reboot nodes to recover from NFS problems, > unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a > complex software stack is a complex software stack, so there's greater > potential for things to go sideways, in particular due to the lack of > resources. > > FPO vs plain replication: this only matters if you have apps that are > capable of exploiting data locality. FPO changes the way GPFS stripes data > across disks. Without FPO, GPFS does traditional wide striping of blocks > across all disks in a given storage pool. When FPO is in use, data in large > files is divided in large (e.g. 1G) chunks, and there's a node that holds > an entire chunk on its internal disks. An application that knows how to > query data block layout of a given file can then schedule the job that > needs to read from this chunk on the node that holds a local copy. This > makes a lot of sense for integrated data analytics workloads, a la Map > Reduce with Hadoop, but doesn't make sense for generic apps like Samba. > > I'm not sure what language in the FAQ creates the impression that the SAN > deployment model is somehow incompatible with running Procotol services. > This is perfectly fine. > > yuri > > [image: Inactive hide details for Jan-Frode Myklebust ---03/06/2016 > 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode > Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want > to stay within whatever is recommended. > > From: Jan-Frode Myklebust > To: gpfsug main discussion list , > Date: 03/06/2016 10:12 PM > Subject: Re: [gpfsug-discuss] Small cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I agree, but would also normally want to stay within whatever is > recommended. > > What about quorum/manager functions? Also OK to run these on the CES nodes > in a 2-node cluster, or any reason to partition these out so that we then > have a 4-node cluster running on 2 physical machines? > > > -jf > s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan <*makaplan at us.ibm.com* > >: > > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you > really need the protection/isolation. But I expect you are limiting the > potential performance of the overall system, compared to running a single > Unix image with multiple processes that can share resource and communicate > more freely. > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From chrisjscott at gmail.com Tue Mar 8 19:10:25 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 19:10:25 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts In-Reply-To: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: To add a customer data point, I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris On 8 March 2016 at 17:45, Dominic Mueller-Wicke01 < dominic.mueller at de.ibm.com> wrote: > Hi, > > please have a look at this document: > http://www-01.ibm.com/support/docview.wss?uid=swg27018848 > It describe the how-to setup and provides some hints and tips for > migration policies. > > Greetings, Dominic. > > > ______________________________________________________________________________________________________________ > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > > > For the new Spectrum Suite of products, are there specific references > with examples on how to set up gpfs policy rules to integrate TSM so > substantially improve the migration performance of HSM? > > The reason I ask is because I've been reading manuals with 200+ pages > where it's very clear this is possible to be accomplished, by builtin > lists and feeding those to TSM, however some of the examples and rules > are presented out of context, and not integrated onto a single > self-contained document. The GPFS past has it own set of manuals, but > so do TSM and HSM. > > For those of you already doing it, what has been your experience, what > are the tricks (where can I read about them), how the addition of > multiple nodes to the working pool is performing? > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:37:22 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:37:22 -0500 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output In-Reply-To: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> References: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> Message-ID: <201603081937.u28JbUoj017559@d01av03.pok.ibm.com> Bob, You can read ioql as "IO queue length" (outside of GPFS) and "qsdl" as QOS queue length at the QOS throttle within GPFS, computed from average delay introduced by the QOS subsystem. These "queue lengths" are virtual or fictional -- They are computed by observing average service times and applying Little's Law. That is there is no single actual queue but each IO request spends some time in the OS + network + disk controller + .... For IO bound workloads one can verify that ioql+qsdl is the average number of application threads waiting for IO. Our documentation puts it this way (See 4.2 Admin Guide, mmlsqos command) iops= The performance of the class in I/O operations per second. ioql= The average number of I/O requests in the class that are pending for reasons other than being queued by QoS. This number includes, for example, I/O requests that are waiting for network or storage device servicing. qsdl= The average number of I/O requests in the class that are queued by QoS. When the QoS system receives an I/O request from the file system, QoS first finds the class to which the I/O request belongs. It then finds whether the class has any I/O operations available for consumption. If not, then QoS queues the request until more I/O operations become available for the class. The Qsdl value is the average number of I/O requests that are held in this queue. et= The interval in seconds during which the measurement was made. You can calculate the average service time for an I/O operation as (Ioql + Qsdl)/Iops. For a system that is running IO-intensive applications, you can interpret the value (Ioql + Qsdl) as the number of threads in the I/O-intensive applications. This interpretation assumes that each thread spends most of its time in waiting for an I/O operation to complete. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 03/08/2016 10:53 AM Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:45:13 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:45:13 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts - Success story! In-Reply-To: References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: <201603081945.u28JjKrL008155@d01av01.pok.ibm.com> "I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris THANKS, SCOTT -- we love to hear/see customer comments and feedback, especially when they are positive ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 20:38:52 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 15:38:52 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> Message-ID: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Wed Mar 9 09:35:56 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Wed, 9 Mar 2016 10:35:56 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Message-ID: <201603090836.u298a1D1017873@d06av10.portsmouth.uk.ibm.com> Hi Jamie, I see. So, the recall-shutdown would be something for a short time period. right? Just for the time it takes to migrate files out and free space. If HSM would allow the recall-shutdown the impact for the users would be that each access to migrated files would lead to an access denied error. Would that be acceptable for the users? Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug-discuss at spectrumscale.org Date: 08.03.2016 21:38 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 12:12:08 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 07:12:08 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> Message-ID: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Yes! A behavior along those lines would be desirable. Users understand very well what it means for a file system to be near full. Are there any customers already doing something similar? Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jamie, > > I see. So, the recall-shutdown would be something for a short time period. > right? Just for the time it takes to migrate files out and free space. If > HSM would allow the recall-shutdown, the impact for the users would be that > each access to migrated files would lead to an access denied error. Would > that be acceptable for the users? > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Cc: gpfsug-discuss at spectrumscale.org > Date: 08.03.2016 21:38 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Thanks for the suggestions Dominic > > I remember playing around with premigrated files at the time, and that > was not satisfactory. > > What we are looking for is a configuration based parameter what will > basically break out of the "transparency for the user" mode, and not > perform any further recalling, period, if|when the file system > occupancy is above a certain threshold (98%). We would not mind if > instead gpfs would issue a preemptive "disk full" error message to any > user/app/job relying on those files to be recalled, so migration on > demand will have a chance to be performance. What we prefer is to swap > precedence, ie, any migration requests would be executed ahead of any > recalls, at least until a certain amount of free space on the file > system has been cleared. > > It's really important that this type of feature is present, for us to > reconsider the TSM version of HSM as a solution. It's not clear from > the manual that this can be accomplish in some fashion. > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> >> Hi, >> >> in all cases a recall request will be handled transparent for the user at >> the time a migrated files is accessed. This can't be prevented and has > two >> down sides: a) the space used in the file system increases and b) random >> access to storage media in the Spectrum Protect server happens. With > newer >> versions of Spectrum Protect for Space Management a so called tape >> optimized recall method is available that can reduce the impact to the >> system (especially Spectrum Protect server). >> If the problem was that the file system went out of space at the time the >> recalls came in I would recommend to reduce the threshold settings for > the >> file system and increase the number of premigrated files. This will allow >> to free space very quickly if needed. If you didn't use the policy based >> threshold migration so far I recommend to use it. This method is >> significant faster compared to the classical HSM based threshold > migration >> approach. >> >> Greetings, Dominic. >> >> > ______________________________________________________________________________________________________________ > >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 > 18:21 >> ----- >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 08.03.2016 17:36 >> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I'm wondering whether the new version of the "Spectrum Suite" will >> allow us set the priority of the HSM migration to be higher than >> staging. >> >> >> I ask this because back in 2011 when we were still using Tivoli HSM >> with GPFS, during mixed requests for migration and staging operations, >> we had a very annoying behavior in which the staging would always take >> precedence over migration. The end-result was that the GPFS would fill >> up to 100% and induce a deadlock on the cluster, unless we identified >> all the user driven stage requests in time, and killed them all. We >> contacted IBM support a few times asking for a way fix this, and were >> told it was built into TSM. Back then we gave up IBM's HSM primarily >> for this reason, although performance was also a consideration (more >> to this on another post). >> >> We are now reconsidering HSM for a new deployment, however only if >> this issue has been resolved (among a few others). >> >> What has been some of the experience out there? >> >> Thanks >> Jaime >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From chrisjscott at gmail.com Wed Mar 9 14:44:39 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Wed, 9 Mar 2016 14:44:39 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: Not meaning to hjack the thread but while we're on the topic of transparent recall: I'd like to be able to disable it such that I can use SS ILM policies agreed with the data owners to "archive" their data and recover disk space by migrating files to tape, marking them as immutable to defend against accidental or malicious deletion and have some user interface that would let them "retrieve" the data back to disk as writable again, subject to sufficient free disk space and within any quota limits as applicable. Cheers Chris On 9 March 2016 at 12:12, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > > >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> >> ______________________________________________________________________________________________________________ >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >> | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the user at >>> the time a migrated files is accessed. This can't be prevented and has >>> >> two >> >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >>> >> newer >> >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time the >>> recalls came in I would recommend to reduce the threshold settings for >>> >> the >> >>> file system and increase the number of premigrated files. This will allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >>> >> migration >> >>> approach. >>> >>> Greetings, Dominic. >>> >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >>> >> | >> >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> >> 18:21 >> >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list < >>> gpfsug-discuss at spectrumscale.org> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. >>> migration >>> >> priority >> >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 9 15:05:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 9 Mar 2016 10:05:31 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> For a write or create operation ENOSPC would make some sense. But if the file already exists and I'm just opening for read access I would be very confused by ENOSPC. How should the system respond: "Sorry, I know about that file, I have it safely stored away in HSM, but it is not available right now. Try again later!" EAGAIN or EBUSY might be the closest in ordinary language... But EAGAIN is used when a system call is interrupted and can be retried right away... So EBUSY? The standard return codes in Linux are: #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* I/O error */ #define ENXIO 6 /* No such device or address */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file number */ #define ECHILD 10 /* No child processes */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memory */ #define EACCES 13 /* Permission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Block device required */ #define EBUSY 16 /* Device or resource busy */ #define EEXIST 17 /* File exists */ #define EXDEV 18 /* Cross-device link */ #define ENODEV 19 /* No such device */ #define ENOTDIR 20 /* Not a directory */ #define EISDIR 21 /* Is a directory */ #define EINVAL 22 /* Invalid argument */ #define ENFILE 23 /* File table overflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a typewriter */ #define ETXTBSY 26 /* Text file busy */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-only file system */ #define EMLINK 31 /* Too many links */ #define EPIPE 32 /* Broken pipe */ #define EDOM 33 /* Math argument out of domain of func */ #define ERANGE 34 /* Math result not representable */ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 15:21:53 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 10:21:53 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> Message-ID: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Wed Mar 9 19:56:13 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 14:56:13 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) Message-ID: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Here is another area where I've been reading material from several sources for years, and in fact trying one solution over the other from time-to-time in a test environment. However, to date I have not been able to find a one-piece-document where all these different IBM alternatives for backup are discussed at length, with the pos and cons well explained, along with the how-to's. I'm currently using TSM(built-in backup client), and over the years I developed a set of tricks to rely on disk based volumes as intermediate cache, and multiple backup client nodes, to split the load and substantially improve the performance of the backup compared to when I first deployed this solution. However I suspect it could still be improved further if I was to apply tools from the GPFS side of the equation. I would appreciate any comments/pointers. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From YARD at il.ibm.com Wed Mar 9 20:16:59 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 9 Mar 2016 22:16:59 +0200 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Message-ID: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Hi Did u use mmbackup with TSM ? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm Please also review this : http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > From: Jaime Pinto > To: gpfsug main discussion list > Date: 03/09/2016 09:56 PM > Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup > scripts) vs. TSM(backup) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Here is another area where I've been reading material from several > sources for years, and in fact trying one solution over the other from > time-to-time in a test environment. However, to date I have not been > able to find a one-piece-document where all these different IBM > alternatives for backup are discussed at length, with the pos and cons > well explained, along with the how-to's. > > I'm currently using TSM(built-in backup client), and over the years I > developed a set of tricks to rely on disk based volumes as > intermediate cache, and multiple backup client nodes, to split the > load and substantially improve the performance of the backup compared > to when I first deployed this solution. However I suspect it could > still be improved further if I was to apply tools from the GPFS side > of the equation. > > I would appreciate any comments/pointers. > > Thanks > Jaime > > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 21:33:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 16:33:49 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Message-ID: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Quoting Yaron Daniel : > Hi > > Did u use mmbackup with TSM ? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm I have used mmbackup on test mode a few times before, while under gpfs 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our facility yet). Under both 3.2 and 3.3 mmbackup would always lock up our cluster when using snapshot. I never understood the behavior without snapshot, and the lock up was intermittent in the carved-out small test cluster, so I never felt confident enough to deploy over the larger 4000+ clients cluster. Another issue was that the version of mmbackup then would not let me choose the client environment associated with a particular gpfs file system, fileset or path, and the equivalent storage pool and /or policy on the TSM side. With the native TSM client we can do this by configuring the dsmenv file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so we can keep the backups segregated on different pools/tapes if necessary (by user, by group, by project, etc) The problem we all agree on is that TSM client traversing is VERY SLOW, and can not be parallelized. I always knew that the mmbackup client was supposed to replace the TSM client for the traversing, and then parse the "necessary parameters" and files to the native TSM client, so it could then take over for the remainder of the workflow. Therefore, the remaining problems are as follows: * I never understood the snapshot induced lookup, and how to fix it. Was it due to the size of our cluster or the version of GPFS? Has it been addressed under 3.5 or 4.x series? Without the snapshot how would mmbackup know what was already gone to backup since the previous incremental backup? Does it check each file against what is already on TSM to build the list of candidates? What is the experience out there? * In the v4r2 version of the manual for the mmbackup utility we still don't seem to be able to determine which TSM BA Client dsmenv to use as a parameter. All we can do is choose the --tsm-servers TSMServer[,TSMServer...]] . I can only conclude that all the contents of any backup on the GPFS side will always end-up on a default storage pool and use the standard TSM policy if nothing else is done. I'm now wondering if it would be ok to simply 'source dsmenv' from a shell for each instance of the mmbackup we fire up, in addition to setting up the other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described on man page. * what about the restore side of things? Most mm* commands can only be executed by root. Should we still have to rely on the TSM BA Client (dsmc|dsmj) if unprivileged users want to restore their own stuff? I guess I'll have to conduct more experiments. > > Please also review this : > > http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf > This is pretty good, as a high level overview. Much better than a few others I've seen with the release of the Spectrum Suite, since it focus entirely on GPFS/TSM/backup|(HSM). It would be nice to have some typical implementation examples. Thanks a lot for the references Yaron, and again thanks for any further comments. Jaime > > > Regards > > > > > > Yaron Daniel > 94 Em Ha'Moshavot Rd > > Server, Storage and Data Services - Team Leader > Petach Tiqva, 49527 > Global Technology Services > Israel > Phone: > +972-3-916-5672 > > > Fax: > +972-3-916-5672 > > > Mobile: > +972-52-8395593 > > > e-mail: > yard at il.ibm.com > > > IBM Israel > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 03/09/2016 09:56 PM >> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >> scripts) vs. TSM(backup) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> Here is another area where I've been reading material from several >> sources for years, and in fact trying one solution over the other from >> time-to-time in a test environment. However, to date I have not been >> able to find a one-piece-document where all these different IBM >> alternatives for backup are discussed at length, with the pos and cons >> well explained, along with the how-to's. >> >> I'm currently using TSM(built-in backup client), and over the years I >> developed a set of tricks to rely on disk based volumes as >> intermediate cache, and multiple backup client nodes, to split the >> load and substantially improve the performance of the backup compared >> to when I first deployed this solution. However I suspect it could >> still be improved further if I was to apply tools from the GPFS side >> of the equation. >> >> I would appreciate any comments/pointers. >> >> Thanks >> Jaime >> >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu Mar 10 08:17:18 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 10 Mar 2016 09:17:18 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Message-ID: <201603100817.u2A8HLXK012633@d06av02.portsmouth.uk.ibm.com> Hi Jaime, I received the same request from other customers as well. could you please open a RFE for the theme and send me the RFE ID? I will discuss it with the product management then. RFE Link: https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Date: 09.03.2016 16:22 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From konstantin.arnold at unibas.ch Thu Mar 10 08:56:01 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Thu, 10 Mar 2016 09:56:01 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <56E136A1.8020202@unibas.ch> Hi Jaime, ... maybe I can give some comments with experience from the field: I would suggest, after reaching a high-watermark threshold, the recall speed could be throttled to a rate that is lower than migration speed (but still high enough to not run into a timeout). I don't think it's a good idea to send access denied while trying to prioritize migration. If non-IT people would see this message they could think the system is broken. It would be unclear what a batch job would do that has to prepare data, in the worst case processing would start with incomplete data. We are currently recalling all out data on tape to be moved to a different system. There is 15x more data on tape than what would fit on the disk pool (and there are millions of files before we set inode quota to a low number). We are moving user/project after an other by using tape ordered recalls. For that we had to disable a policy that was aggressively pre-migrating files and allowed to quickly free space on the disk pool. I must admit that it took us a while of tuning thresholds and policies. Best Konstantin On 03/09/2016 01:12 PM, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time >> period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> ______________________________________________________________________________________________________________ >> >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >> Lead | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the >>> user at >>> the time a migrated files is accessed. This can't be prevented and has >> two >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >> newer >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time >>> the >>> recalls came in I would recommend to reduce the threshold settings for >> the >>> file system and increase the number of premigrated files. This will >>> allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >> migration >>> approach. >>> >>> Greetings, Dominic. >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead >> | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>> Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >> 18:21 >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Thu Mar 10 10:55:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 05:55:21 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <56E136A1.8020202@unibas.ch> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <56E136A1.8020202@unibas.ch> Message-ID: <20160310055521.85234y7d2m6c97kp@support.scinet.utoronto.ca> Quoting Konstantin Arnold : > Hi Jaime, > > ... maybe I can give some comments with experience from the field: > I would suggest, after reaching a high-watermark threshold, the recall > speed could be throttled to a rate that is lower than migration speed > (but still high enough to not run into a timeout). I don't think it's a > good idea to send access denied while trying to prioritize migration. If > non-IT people would see this message they could think the system is > broken. It would be unclear what a batch job would do that has to > prepare data, in the worst case processing would start with incomplete data. I wouldn't object to any strategy that lets us empty the vase quicker than it's being filled. It may just make the solution more complex for developers, since this feels a lot like a mini-scheduler. On the other hand I don't see much of an issue for non-IT people or batch jobs depending on the data to be recalled: we already enable quotas on our file systems. When quotas are reached the system is supposed to "break" anyway, for that particular user|group or application, and they still have to handle this situation properly. > > We are currently recalling all out data on tape to be moved to a > different system. There is 15x more data on tape than what would fit on > the disk pool (and there are millions of files before we set inode quota > to a low number). We are moving user/project after an other by using > tape ordered recalls. For that we had to disable a policy that was > aggressively pre-migrating files and allowed to quickly free space on > the disk pool. I must admit that it took us a while of tuning thresholds > and policies. That is certainly an approach to consider. We still think the application should be able to properly manage occupancy on the same file system. We run a different system which has a disk based cache layer as well, and the strategy is to keep it as full as possible (85-90%), so to avoid retrieving data from tape whenever possible, while still leaving some cushion for newly saved data. Indeed finding the sweet spot is a balancing act. Thanks for the feedback Jaime > > Best > Konstantin > > > > On 03/09/2016 01:12 PM, Jaime Pinto wrote: >> Yes! A behavior along those lines would be desirable. Users understand >> very well what it means for a file system to be near full. >> >> Are there any customers already doing something similar? >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> Hi Jamie, >>> >>> I see. So, the recall-shutdown would be something for a short time >>> period. >>> right? Just for the time it takes to migrate files out and free space. If >>> HSM would allow the recall-shutdown, the impact for the users would be >>> that >>> each access to migrated files would lead to an access denied error. Would >>> that be acceptable for the users? >>> >>> Greetings, Dominic. >>> >>> ______________________________________________________________________________________________________________ >>> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> >>> >>> >>> From: Jaime Pinto >>> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 08.03.2016 21:38 >>> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>> >>> >>> >>> Thanks for the suggestions Dominic >>> >>> I remember playing around with premigrated files at the time, and that >>> was not satisfactory. >>> >>> What we are looking for is a configuration based parameter what will >>> basically break out of the "transparency for the user" mode, and not >>> perform any further recalling, period, if|when the file system >>> occupancy is above a certain threshold (98%). We would not mind if >>> instead gpfs would issue a preemptive "disk full" error message to any >>> user/app/job relying on those files to be recalled, so migration on >>> demand will have a chance to be performance. What we prefer is to swap >>> precedence, ie, any migration requests would be executed ahead of any >>> recalls, at least until a certain amount of free space on the file >>> system has been cleared. >>> >>> It's really important that this type of feature is present, for us to >>> reconsider the TSM version of HSM as a solution. It's not clear from >>> the manual that this can be accomplish in some fashion. >>> >>> Thanks >>> Jaime >>> >>> Quoting Dominic Mueller-Wicke01 : >>> >>>> >>>> >>>> Hi, >>>> >>>> in all cases a recall request will be handled transparent for the >>>> user at >>>> the time a migrated files is accessed. This can't be prevented and has >>> two >>>> down sides: a) the space used in the file system increases and b) random >>>> access to storage media in the Spectrum Protect server happens. With >>> newer >>>> versions of Spectrum Protect for Space Management a so called tape >>>> optimized recall method is available that can reduce the impact to the >>>> system (especially Spectrum Protect server). >>>> If the problem was that the file system went out of space at the time >>>> the >>>> recalls came in I would recommend to reduce the threshold settings for >>> the >>>> file system and increase the number of premigrated files. This will >>>> allow >>>> to free space very quickly if needed. If you didn't use the policy based >>>> threshold migration so far I recommend to use it. This method is >>>> significant faster compared to the classical HSM based threshold >>> migration >>>> approach. >>>> >>>> Greetings, Dominic. >>>> >>>> >>> ______________________________________________________________________________________________________________ >>> >>> >>>> >>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>>> Lead >>> | >>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>>> >>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>>> Wittkopp >>>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>>> Stuttgart, >>>> HRB 243294 >>>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> 18:21 >>>> ----- >>>> >>>> From: Jaime Pinto >>>> To: gpfsug main discussion list >>>> >>>> Date: 08.03.2016 17:36 >>>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I'm wondering whether the new version of the "Spectrum Suite" will >>>> allow us set the priority of the HSM migration to be higher than >>>> staging. >>>> >>>> >>>> I ask this because back in 2011 when we were still using Tivoli HSM >>>> with GPFS, during mixed requests for migration and staging operations, >>>> we had a very annoying behavior in which the staging would always take >>>> precedence over migration. The end-result was that the GPFS would fill >>>> up to 100% and induce a deadlock on the cluster, unless we identified >>>> all the user driven stage requests in time, and killed them all. We >>>> contacted IBM support a few times asking for a way fix this, and were >>>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>>> for this reason, although performance was also a consideration (more >>>> to this on another post). >>>> >>>> We are now reconsidering HSM for a new deployment, however only if >>>> this issue has been resolved (among a few others). >>>> >>>> What has been some of the experience out there? >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.org >>>> University of Toronto >>>> 256 McCaul Street, Room 235 >>>> Toronto, ON, M5T1W5 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu Mar 10 11:17:41 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 06:17:41 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Here is some feedback on the use of mmbackup: Last night I decided to test mmbackup again, in the simplest syntax call possible (see below), and it ran like a charm! We have a 15TB GPFS with some 41 million files, running gpfs v 3.5; it certainty behaved better than what I remember when I last tried this under 3.3 or 3.2, however I still didn't specify a snapshot. I guess it didn't really matter. My idea of sourcing the dsmenv file normally used by the TSM BA client before starting mmbackup was just what I needed to land the backup material in the same pool and using the same policies normally used by the TSM BA client for this file system. For my surprise, mmbackup was smart enough to query the proper TSM database for all files already there and perform the incremental backup just as the TSM client would on its own. The best of all: it took just under 7 hours, while previously the TSM client was taking over 27 hours: that is nearly 1/4 of the time, using the same node! This is really good, since now I can finally do a true *daily* backup of this FS, so I'll refining and adopting this process moving forward, possibly adding a few more nodes as traversing helpers. Cheers Jaime [root at gpc-f114n016 bin]# mmbackup /sysadmin -t incremental -s /tmp -------------------------------------------------------- mmbackup: Backup of /sysadmin begins at Wed Mar 9 19:45:27 EST 2016. -------------------------------------------------------- Wed Mar 9 19:45:48 2016 mmbackup:Could not restore previous shadow file from TSM server TAPENODE Wed Mar 9 19:45:48 2016 mmbackup:Querying files currently backed up in TSM server:TAPENODE. Wed Mar 9 21:55:59 2016 mmbackup:Built query data file from TSM server: TAPENODE rc = 0 Wed Mar 9 21:56:01 2016 mmbackup:Scanning file system sysadmin Wed Mar 9 23:47:53 2016 mmbackup:Reconstructing previous shadow file /sysadmin/.mmbackupShadow.1.TAPENODE from query data for TAPENODE Thu Mar 10 01:05:06 2016 mmbackup:Determining file system changes for sysadmin [TAPENODE]. Thu Mar 10 01:08:40 2016 mmbackup:changed=26211, expired=30875, unsupported=0 for server [TAPENODE] Thu Mar 10 01:08:40 2016 mmbackup:Sending files to the TSM server [26211 changed, 30875 expired]. Thu Mar 10 01:38:41 2016 mmbackup:Expiring files: 0 backed up, 15500 expired, 0 failed. Thu Mar 10 02:42:08 2016 mmbackup:Backing up files: 10428 backed up, 30875 expired, 72 failed. Thu Mar 10 02:58:40 2016 mmbackup:mmapplypolicy for Backup detected errors (rc=9). Thu Mar 10 02:58:40 2016 mmbackup:Completed policy backup run with 0 policy errors, 72 files failed, 0 severe errors, returning rc=9. Thu Mar 10 02:58:40 2016 mmbackup:Policy for backup returned 9 Highest TSM error 4 mmbackup: TSM Summary Information: Total number of objects inspected: 57086 Total number of objects backed up: 26139 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 30875 Total number of objects failed: 72 Thu Mar 10 02:58:40 2016 mmbackup:Analyzing audit log file /sysadmin/mmbackup.audit.sysadmin.TAPENODE Thu Mar 10 02:58:40 2016 mmbackup:72 files not backed up for this server. ( failed:72 ) Thu Mar 10 02:58:40 2016 mmbackup:Worst TSM exit 4 Thu Mar 10 02:58:41 2016 mmbackup:72 failures were logged. Compensating shadow database... Thu Mar 10 03:06:23 2016 mmbackup:Analysis complete. 72 of 72 failed or excluded paths compensated for in 1 pass(es). Thu Mar 10 03:09:08 2016 mmbackup:TSM server TAPENODE had 72 failures or excluded paths and returned 4. Its shadow database has been updated. Thu Mar 10 03:09:08 2016 mmbackup:Incremental backup completed with some skipped files. TSM had 0 severe errors and returned 4. See the TSM log file for more information. 72 files had errors, TSM audit logs recorded 72 errors from 1 TSM servers, 0 TSM servers skipped. exit 4 ---------------------------------------------------------- mmbackup: Backup of /sysadmin completed with some skipped files at Thu Mar 10 03:09:11 EST 2016. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. Quoting Jaime Pinto : > Quoting Yaron Daniel : > >> Hi >> >> Did u use mmbackup with TSM ? >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm > > I have used mmbackup on test mode a few times before, while under gpfs > 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our > facility yet). > > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so I > never felt confident enough to deploy over the larger 4000+ clients > cluster. > > Another issue was that the version of mmbackup then would not let me > choose the client environment associated with a particular gpfs file > system, fileset or path, and the equivalent storage pool and /or policy > on the TSM side. > > With the native TSM client we can do this by configuring the dsmenv > file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so > we can keep the backups segregated on different pools/tapes if > necessary (by user, by group, by project, etc) > > The problem we all agree on is that TSM client traversing is VERY SLOW, > and can not be parallelized. I always knew that the mmbackup client was > supposed to replace the TSM client for the traversing, and then parse > the "necessary parameters" and files to the native TSM client, so it > could then take over for the remainder of the workflow. > > Therefore, the remaining problems are as follows: > * I never understood the snapshot induced lookup, and how to fix it. > Was it due to the size of our cluster or the version of GPFS? Has it > been addressed under 3.5 or 4.x series? Without the snapshot how would > mmbackup know what was already gone to backup since the previous > incremental backup? Does it check each file against what is already on > TSM to build the list of candidates? What is the experience out there? > > * In the v4r2 version of the manual for the mmbackup utility we still > don't seem to be able to determine which TSM BA Client dsmenv to use as > a parameter. All we can do is choose the --tsm-servers > TSMServer[,TSMServer...]] . I can only conclude that all the contents > of any backup on the GPFS side will always end-up on a default storage > pool and use the standard TSM policy if nothing else is done. I'm now > wondering if it would be ok to simply 'source dsmenv' from a shell for > each instance of the mmbackup we fire up, in addition to setting up the > other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described > on man page. > > * what about the restore side of things? Most mm* commands can only be > executed by root. Should we still have to rely on the TSM BA Client > (dsmc|dsmj) if unprivileged users want to restore their own stuff? > > I guess I'll have to conduct more experiments. > > > >> >> Please also review this : >> >> http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf >> > > This is pretty good, as a high level overview. Much better than a few > others I've seen with the release of the Spectrum Suite, since it focus > entirely on GPFS/TSM/backup|(HSM). It would be nice to have some > typical implementation examples. > > > > Thanks a lot for the references Yaron, and again thanks for any further > comments. > Jaime > > >> >> >> Regards >> >> >> >> >> >> Yaron Daniel >> 94 Em Ha'Moshavot Rd >> >> Server, Storage and Data Services - Team Leader >> Petach Tiqva, 49527 >> Global Technology Services >> Israel >> Phone: >> +972-3-916-5672 >> >> >> Fax: >> +972-3-916-5672 >> >> >> Mobile: >> +972-52-8395593 >> >> >> e-mail: >> yard at il.ibm.com >> >> >> IBM Israel >> >> >> >> >> >> >> >> gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: >> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> Date: 03/09/2016 09:56 PM >>> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >>> scripts) vs. TSM(backup) >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> Here is another area where I've been reading material from several >>> sources for years, and in fact trying one solution over the other from >>> time-to-time in a test environment. However, to date I have not been >>> able to find a one-piece-document where all these different IBM >>> alternatives for backup are discussed at length, with the pos and cons >>> well explained, along with the how-to's. >>> >>> I'm currently using TSM(built-in backup client), and over the years I >>> developed a set of tricks to rely on disk based volumes as >>> intermediate cache, and multiple backup client nodes, to split the >>> load and substantially improve the performance of the backup compared >>> to when I first deployed this solution. However I suspect it could >>> still be improved further if I was to apply tools from the GPFS side >>> of the equation. >>> >>> I would appreciate any comments/pointers. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Thu Mar 10 12:00:09 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 10 Mar 2016 12:00:09 +0000 Subject: [gpfsug-discuss] systemd Message-ID: So just picking up this from Feb 2015, have been doing some upgrades to 4.2.0.1, and see that there is now systemd support as part of this... Now I just need to unpick the local hacks we put into the init script (like wait for IB to come up) and implement those as proper systemd deps I guess. Thanks for sorting this though IBM! Simon On 10/02/2015, 15:17, "gpfsug-discuss-bounces at gpfsug.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: >Does any one have a systemd manifest for GPFS which they would share? > >As RedHat EL 7 is now using systemd and Ubuntu is now supported with >4.1p5, it seems sensible for GPFS to have systemd support. > >We're testing some services running off gpfs and it would be useful to >have a manifest so we can make the services dependent on gpfs being up >before they start. > >Or any suggestions on making systemd services dependent on a SysV script? > >Thanks > >Simon >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu Mar 10 14:46:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 10 Mar 2016 09:46:12 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com><20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Message-ID: <201603101446.u2AEkJPP018456@d01av02.pok.ibm.com> Jaime, Thanks for the positive feedback and success story on mmbackup. We need criticism to keep improving the product - but we also need encouragement to know we are heading in the right direction and making progress. BTW - (depending on many factors) you may be able to save some significant backup time by running over multiple nodes with the -N option. --marc. (I am Mr. mmapplypolicy and work with Mr. mmbackup.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Mar 11 00:15:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 19:15:49 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> Message-ID: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From s.m.killen at leeds.ac.uk Fri Mar 11 13:19:41 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:19:41 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Message-ID: <56E2C5ED.8060500@leeds.ac.uk> Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From stschmid at de.ibm.com Fri Mar 11 13:41:54 2016 From: stschmid at de.ibm.com (Stefan Schmidt) Date: Fri, 11 Mar 2016 14:41:54 +0100 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> The message means following and is a warning without an direct affect to the function but an indicator that something is may wrong with the enclosure. Check the maintenance procedure which is shown for the event in the GUI event panel. /** Ambient temperature of power supply "{0}" undercut the lower warning threshold at {1}. */ MS0297("MS0297W",'W'), "Cause": "If the lower warning threshold is undercut a the device operation should not be affected. However this might indicate a hardware defect.", "User_action": "Follow the maintenance procedure for the enclosure.", "code": "MS0297", "description": "Ambient temperature of power supply \"{0}\" undercut the lower warning threshold at {1}.", Mit freundlichen Gr??en / Kind regards Stefan Schmidt Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. M069 / IBM Spectrum Scale Software Development IBM Systems Group IBM Deutschland Phone: +49-6131-84-3465 IBM Deutschland Mobile: +49-170-6346601 Hechtsheimer Str. 2 E-Mail: stschmid at de.ibm.com 55131 Mainz Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.m.killen at leeds.ac.uk Fri Mar 11 13:50:58 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:50:58 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> References: <56E2C5ED.8060500@leeds.ac.uk> <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> Message-ID: <0D6C2DBC-4B82-4038-83C0-B0255C8DF9E0@leeds.ac.uk> Hi Stefan Thanks for the quick reply, I appear to have mistyped the error.. It's MS0279. See attached png. -- Sean --? ------------------------------------------------------------------- ??? Dr Sean M Killen ??? UNIX Support Officer, IT ??? Faculty of Biological Sciences ??? University of Leeds ??? LEEDS ??? LS2 9JT ??? United Kingdom ??? Tel: +44 (0)113 3433148 ??? Mob: +44 (0)776 8670907 ??? Fax: +44 (0)113 3438465 ??? GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- On 11 March 2016 13:41:54 GMT+00:00, Stefan Schmidt wrote: >The message means following and is a warning without an direct affect >to >the function but an indicator that something is may wrong with the >enclosure. Check the maintenance procedure which is shown for the event >in >the GUI event panel. > >/** Ambient temperature of power supply "{0}" undercut the lower >warning >threshold at {1}. */ > MS0297("MS0297W",'W'), > "Cause": "If the lower warning threshold is undercut a the >device operation should not be affected. However this might indicate a >hardware defect.", > "User_action": "Follow the maintenance procedure for the >enclosure.", > "code": "MS0297", > "description": "Ambient temperature of power supply \"{0}\" >undercut the lower warning threshold at {1}.", > > >Mit freundlichen Gr??en / Kind regards > >Stefan Schmidt > >Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. >M069 / IBM Spectrum Scale Software Development >IBM Systems Group >IBM Deutschland > > > >Phone: >+49-6131-84-3465 > IBM Deutschland > >Mobile: >+49-170-6346601 > Hechtsheimer Str. 2 >E-Mail: >stschmid at de.ibm.com > 55131 Mainz > > > Germany > > >IBM Deutschland Research & Development GmbH / Vorsitzende des >Aufsichtsrats: Martina Koederitz >Gesch?ftsf?hrung: Dirk Wittkopp >Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht >Stuttgart, >HRB 243294 > > > > > > >------------------------------------------------------------------------ > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot-AstburyBSL.absl.prv - Dashboard - Mozilla Firefox.png Type: image/png Size: 144612 bytes Desc: not available URL: From sophie.carsten at uk.ibm.com Fri Mar 11 13:53:36 2016 From: sophie.carsten at uk.ibm.com (Sophie Carsten) Date: Fri, 11 Mar 2016 13:53:36 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111355.u2BDtMBO007426@d06av12.portsmouth.uk.ibm.com> Hi, In terms of the NSDs, you need to run the nsd devices script if they're not in /dev/dmX-, here's the link to the knowledge center: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_nsdpro.htm?lang=en The installer should work as normal after this script has been run. We were hoping to get this solved in the upcoming version of the installer, so the user doesn't have to manually run the script. But the previous install team has been put on a new project in IBM, and I can't really comment any longer on when this could be expected to be delivered by the new team put in place. Hope the link gets you further off the ground though. Sophie Carsten IBM Spectrum Virtualize Development Engineer IBM Systems - Manchester Lab 44-161-9683886 sophie.carsten at uk.ibm.com From: Sean Killen To: gpfsug main discussion list Date: 11/03/2016 13:20 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- [attachment "signature.asc" deleted by Sophie Carsten/UK/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 11422 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Fri Mar 11 14:30:24 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Mar 2016 14:30:24 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 Message-ID: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Mar 11 15:27:20 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Mar 2016 10:27:20 -0500 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111522.u2BFMqvG008617@d01av05.pok.ibm.com> You may need/want to set up an nsddevices script to help GPFS find all your disks. Google it! Or ... http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_nsddevices.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Fri Mar 11 15:46:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 15:46:39 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <1457711199.4251.245.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-11 at 13:19 +0000, Sean Killen wrote: > Hi all, > > So I have finally got my SpectrumScale system installed (well half of > it). But it wasn't without some niggles. > > We have purchased DELL MD3860i disk trays with dual controllers (each > with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a > while getting a nice multipath setup in place with 'friendly' names set > Oh dear. I guess it might work with 10Gb Ethernet but based on my personal experience iSCSI is spectacularly unsuited to GPFS. Either your NSD nodes can overwhelm the storage arrays or the storage arrays can overwhelm the NSD servers and performance falls through the floor. That is unless you have Data Center Ethernet at which point you might as well have gone Fibre Channel in the first place. Though unless you are going to have large physical separation between the storage and NSD servers 12Gb SAS is a cheaper option and you can still have four NSD servers hooked up to each MD3 based storage array. I have in the past implement GPFS on Dell MD3200i's. I did eventually get it working reliably but it was so suboptimal with so many compromises that as soon as the MD3600f came out we purchased these to replaced the MD3200i's. Lets say you have three storage arrays with two paths to each controller and four NSD servers. Basically what happens is that an NSD server issues a bunch of requests for blocks to the storage arrays. Then all 12 paths start answering to your two connections to the NSD server. At this point the Ethernet adaptors on your NSD servers are overwhelmed 802.1D PAUSE frames start being issued which just result in head of line blocking and performance falls through the floor. You need Data Center Ethernet to handle this properly, which is probably why FCoE never took off as you can't just use the Ethernet switches and adaptors you have. Both FC and SAS handle this sort of congestion gracefully unlike ordinary Ethernet. Now the caveat for all this is that it is much easier to overwhelm a 1Gbps link than a 10Gbps link. However with the combination of SSD and larger cache's I can envisage that a 10Gbps link could be overwhelmed and you would then see the same performance issues that I saw. Basically the only way out is a one to one correspondence between ports on the NSD's and the storage controllers. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Fri Mar 11 15:46:46 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 15:46:46 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 In-Reply-To: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> References: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> Message-ID: Hi Bob, But on the plus side, I noticed in the release notes: "If you are coming from 4.1.1-X, you must first upgrade to 4.2.0-0. You may use this 4.2.0-2 package to perform a First Time Install or to upgrade from an existing 4.2.0-X level." So it looks like its no longer necessary to install 4.2.0 and then apply PTFs. I remember talking to someone a while ago and they were hoping this might happen, but it seems that it actually has! Nice! Simon From: > on behalf of "Oesterlin, Robert" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 11 March 2016 at 14:30 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Fri Mar 11 16:02:37 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Fri, 11 Mar 2016 17:02:37 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Message-ID: <201603111502.u2BF2kk6007636@d06av10.portsmouth.uk.ibm.com> Jaime, found the RFE and will discuss it with product management. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug main discussion list , Marc A Kaplan Date: 11.03.2016 01:15 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Fri Mar 11 20:55:29 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 20:55:29 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 Message-ID: What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Mar 11 21:10:14 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 21:10:14 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: That looks pretty much like the right process. Check that all the components upgraded ... rpm -qa | grep gpfs You may need to do an rpm -e on the gpfs.gplbin package and then install the newly built one Are you doing make rpm to build the rpm version of gpfs.gplbin and installing that? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [damir.krstic at gmail.com] Sent: 11 March 2016 20:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir From damir.krstic at gmail.com Fri Mar 11 21:13:47 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 21:13:47 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: Thanks for the reply. Didn't run make rpm just make autoconfig etc. Checked the versions and it all looks good and valid. Will play with it again and see if there is a step missing. Damir On Fri, Mar 11, 2016 at 15:10 Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. > > Check that all the components upgraded ... rpm -qa | grep gpfs > > You may need to do an rpm -e on the gpfs.gplbin package and then install > the newly built one > > Are you doing make rpm to build the rpm version of gpfs.gplbin and > installing that? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [ > damir.krstic at gmail.com] > Sent: 11 March 2016 20:55 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs > 3.5.0-21 > > What is the correct procedure to upgrade from 3.5 to 4.1? > > What I have tried is uninstalling existing 3.5 version (rpm -e) and > installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled > kernel extensions: > cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages > > Rebooted the node and have been getting: > daemon and kernel extension do not match. > > I've tried rebuilding extensions again and still could not get it to work. > I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting > daemon and kernel extension do not match error with 3.5 version on a single > node. So, couple of questions: > What is the correct way of upgrading from 3.5 to 4.1.0.0? > > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 11 22:58:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 22:58:08 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: <56E34D80.7000703@buzzard.me.uk> On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their derivatives), then they will now complain constantly that you have modified the RPM database outside yum. As such it is recommended by RedHat that you do "yum remove" and "yum install" rather than running rpm directly. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From pavel.pokorny at datera.cz Sat Mar 12 08:23:49 2016 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Sat, 12 Mar 2016 09:23:49 +0100 Subject: [gpfsug-discuss] SMB and NFS limitations? Message-ID: Hello, on Spectrum Scale FAQ page I found following recommendations for SMB and NFS: *A maximum of 3,000 SMB connections is recommended per protocol node with a maximum of 20,000 SMB connections per cluster. A maximum of 4,000 NFS connections per protocol node is recommended. A maximum of 2,000 Object connections per protocol nodes is recommended.* Are there any other limits? Like max number of shares? Thanks, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Hadovit? 962/10 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Mon Mar 14 14:22:20 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Mon, 14 Mar 2016 14:22:20 +0000 Subject: [gpfsug-discuss] Registration now open! Message-ID: <400eedb0a81cd193a694176794f1dc07@webmail.gpfsug.org> Dear members, The registration for the UK Spring 2016 Spectrum Scale (GPFS) User Group meeting is now open. We have a fantastic and full agenda of presentations from users and subject experts. The two-day event is taking place at the IBM Client Centre in London on 17th and 18th May. For the current agenda, further details and to register your place, please visit: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 Places at the event are limited so it is recommended that you register early to avoid disappointment. Due to capacity restrictions, there is currently a limit of three people per organisation; this will be relaxed if places remain nearer the event date. We'd like to thank our sponsors of this year's User Group as without their support the two-day event would not be possible. Thanks go to Arcastream, DDN, IBM, Lenovo, Mellanox, NetApp, OCF and Seagate for their support. We hope to see you at the May event! Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 15 19:39:51 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 15 Mar 2016 15:39:51 -0400 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? Message-ID: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From damir.krstic at gmail.com Tue Mar 15 20:31:55 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:31:55 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Mar 15 20:33:35 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:33:35 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: <56E34D80.7000703@buzzard.me.uk> References: <56E34D80.7000703@buzzard.me.uk> Message-ID: Figured it out - this node had RedHat version of a kernel that was custom patched by RedHat some time ago for the IB issues we were experiencing. I could not build a portability layer on this kernel. After upgrading the node to more recent version of the kernel, I was able to compile portability layer and get it all working. Thanks for suggestions. Damir On Fri, Mar 11, 2016 at 4:58 PM Jonathan Buzzard wrote: > On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > > > That looks pretty much like the right process. > > Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their > derivatives), then they will now complain constantly that you have > modified the RPM database outside yum. > > As such it is recommended by RedHat that you do "yum remove" and "yum > install" rather than running rpm directly. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:45:05 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:45:05 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Mar 15 21:50:20 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 15 Mar 2016 21:50:20 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Not sure about cluster features, but at minimum you'll need to create the filesystem with low enough mmcrfs --version string. -jf tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic : > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Mar 15 22:22:17 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 15 Mar 2016 23:22:17 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <56E88B19.4060708@unibas.ch> It's definitely doable, besides --version mentioned byJan-Frode, just a two things to consider (when cluster started as 3.5 or earlier version) we stumbled across: - keys nistCompliance=SP800-131A: we had to regenerate and exchange new keys with nistCompliance before old cluster could talk to new remotecluster - maxblocksize: you would want ESS to run with maxblocksize 16M - cluster with 3.5 probably has set a smaller value (default 1M) and to change that you have to stop GPFS Best Konstantin On 03/15/2016 10:50 PM, Jan-Frode Myklebust wrote: > Not sure about cluster features, but at minimum you'll need to create > the filesystem with low enough mmcrfs --version string. > > > > > -jf > > tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic >: > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute > cluster. When looking at GPFS coexistance documents, it is not clear > whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know > if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 > cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ------------------------------------------------------------------------------------------- Konstantin Arnold | University of Basel & SIB Klingelbergstrasse 50/70 | CH-4056 Basel | Phone: +41 61 267 15 82 Email: konstantin.arnold at unibas.ch From Paul.Sanchez at deshaw.com Wed Mar 16 03:28:59 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 16 Mar 2016 03:28:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 13:08:51 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 13:08:51 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: > You do have to keep an eye out for filesystem version issues as you set > this up. If the new filesystem is created with a version higher than the > 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. > > > > You can specify the version of a new filesystem at creation time with, for > example, ?mmcrfs ?version 3.5.?. > > You can confirm an existing filesystem?s version with ?mmlsfs > | grep version?. > > > > There are probably a pile of caveats about features that you can never get > on the new filesystem though. If you don?t need high-bandwidth, parallel > access to the new filesystem from the 3.5 cluster, you could use CES or > CNFS for a time, until the 3.5 cluster is upgraded or retired. > > > > A possibly better recommendation would be to upgrade the 3.5 cluster to at > least 4.1, if not 4.2, instead. It would continue to be able to serve any > of your old version filesystems, but not prohibit you from moving forward > on the new ones. > > > > -Paul > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Oesterlin, Robert > *Sent:* Tuesday, March 15, 2016 4:45 PM > > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] cross-cluster mounting different versions > of gpfs > > > > I?ve never used ESS, but I state for a fact you can cross mount clusters > at various levels without a problem ? I do it all the time during upgrades. > I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may > be limited on 4.2 features when accessing it via the 3.5 cluster, but data > access should work fine. > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Damir > Krstic > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, March 15, 2016 at 3:31 PM > *To: *gpfsug main discussion list > *Subject: *[gpfsug-discuss] cross-cluster mounting different versions of > gpfs > > > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > > > Thanks, > > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Mar 16 13:29:42 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 14:29:42 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: <201603161329.u2GDTpjP006773@d06av09.portsmouth.uk.ibm.com> Hi, Damir, you cannot mount a 4.x fs level from a 3.5 level cluster / node. You need to create the fs with a sufficiently low level, fs level downgrade is not possible, AFAIK. 3.5 nodes can mount fs from 4.1 cluster (fs at 3.5.0.7 fs level), that I can confirm for sure. Uwe Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 02:09 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto: gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Damir Krstic Reply-To: gpfsug main discussion list Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Wed Mar 16 15:20:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:20:50 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYR3X029313@d03av02.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:03:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:03:27 +0000 Subject: [gpfsug-discuss] Perfileset df explanation Message-ID: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Wed Mar 16 16:05:48 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 16 Mar 2016 16:05:48 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:12:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:12:54 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A : > All, > > > > Can someone explain that this means? :: > > > > --filesetdf > > Displays a yes or no value indicating whether filesetdf is enabled; if > yes, the mmdf command reports numbers based on the quotas for the fileset > and not for the total file system. > > > > What this means, as in the output I would expect to see from mmdf with > this option set to Yes, and No? I don?t think it?s supposed to give any > indication of over-provision and cursory tests suggest it doesn?t. > > > > Thanks > > > > Richard > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:24:49 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:24:49 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Ah, I see, thanks for that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 16 March 2016 16:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A >: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Wed Mar 16 17:07:28 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:07:28 -0500 Subject: [gpfsug-discuss] 4.2 installer Message-ID: <56E992D0.3050603@genome.wustl.edu> All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_migratingtoISS4.2fromISS4.1.1.htm But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_configuringgpfs.htm%23configuringgpfs?lang=en Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:15:02 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:15:02 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E992D0.3050603@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> Message-ID: Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Mar 16 17:18:47 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 16 Mar 2016 18:18:47 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: > The key point is that you must create the file system so that is "looks" > like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a > test filesystem back on the 3.5 cluster and look at the version string. > mmslfs xxx -V. Then go to the 4.x system and try to create a file system > with the same version string.... > > > [image: Marc A Kaplan] > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Mar 16 17:20:11 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 17:20:11 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu>, Message-ID: Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 16 March 2016 17:15 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= From mweil at genome.wustl.edu Wed Mar 16 17:36:26 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:36:26 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <56E9999A.7030902@genome.wustl.edu> We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? On 3/16/16 12:20 PM, Simon Thompson (Research Computing - IT Services) wrote: > Does the installer manage to make the rpm kernel layer ok on clone oses? > > Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... > > (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] > Sent: 16 March 2016 17:15 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > Hi Matt > > I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > > > > From: > on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:07 PM > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] 4.2 installer > > All, > > Attempting to upgrade our into our dev environment. The update to 4.2 > was simple. > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= > > But I am confused on the installation toolkit. It seems that it is > going to set it all up and I just want to upgrade a cluster that is > already setup. Anyway to just pull in the current cluster info? > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= > > Thanks > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:36:37 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:36:37 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <2097A8FD-3A42-4D36-8DC2-1DDA6BC9984C@nuance.com> Sadly, it fails if the node can?t run mmbuildgpl, also on the clone OS?s of RedHat. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of "Simon Thompson (Research Computing - IT Services)" > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:20 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Mar 16 17:40:42 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:40:42 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9999A.7030902@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> Message-ID: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 18:07:59 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 18:07:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 18:47:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 18:47:06 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <56E9AA2A.3010108@buzzard.me.uk> On 16/03/16 18:07, Damir Krstic wrote: > Sven, > > For us, at least, at this point in time, we have to create new > filesystem with version flag. The reason is we can't take downtime to > upgrade all of our 500+ compute nodes that will cross-cluster mount this > new storage. We can take downtime in June and get all of the nodes up to > 4.2 gpfs version but we have users today that need to start using the > filesystem. > You can upgrade a GPFS file system piece meal. That is there should be no reason to take the whole system off-line to perform the upgrade. So you can upgrade a compute nodes to GPFS 4.2 one by one and they will happily continue to talk to the NSD's running 3.5 while the other nodes continue to use the file system. In a properly designed GPFS cluster you should also be able to take individual NSD nodes out for the upgrade. Though I wouldn't recommend running mixed versions on a long term basis, it is definitely fine for the purposes of upgrading. Then once all nodes in the GPFS cluster are upgraded you issue the mmchfs -V full. How long this will take will depend on the maximum run time you allow for your jobs. You would need to check that you can make a clean jump from 3.5 to 4.2 but IBM support should be able to confirm that for you. This is one of the nicer features of GPFS; its what I refer to as "proper enterprise big iron computing". That is if you have to take the service down at any time for any reason you are doing it wrong. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From UWEFALKE at de.ibm.com Wed Mar 16 18:51:59 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 19:51:59 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Wed Mar 16 19:06:02 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 19:06:02 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <56E9AA2A.3010108@buzzard.me.uk> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: Jonathan, Gradual upgrade is indeed a nice feature of GPFS. We are planning to gradually upgrade our clients to 4.2. However, before all, or even most clients are upgraded, we have to be able to mount this new 4.2 filesystem on all our compute nodes that are running version 3.5. Here is our environment today: storage cluster - 14 nsd servers * gpfs3.5 compute cluster - 500+ clients * gpfs3.5 <--- this cluster is mounting storage cluster filesystems new to us ESS cluster * gpfs4.2 ESS will become its own GPFS cluster and we want to mount its filesystems on our compute cluster. So far so good. We understand that we will eventually want to upgrade all our nodes in compute cluster to 4.2 and we know the upgrade path (3.5 --> 4.1 --> 4.2). The reason for this conversation is: with ESS and GPFS 4.2 can we remote mount it on our compute cluster? The answer we got is, yes if you build a new filesystem with --version flag. Sven, however, has just pointed out that this may not be desirable option since there are some features that are permanently lost when building a filesystem with --version. In our case, however, even though we will upgrade our clients to 4.2 (some gradually as pointed elsewhere in this conversation, and most in June), we have to be able to mount the new ESS filesystem on our compute cluster before the clients are upgraded. It seems like, even though Sven is recommending against it, building a filesystem with --version flag is our only option. I guess we have another option, and that is to upgrade all our clients first, but we can't do that until June so I guess it's really not an option at this time. I hope this makes our constraints clear: mainly, without being able to take downtime on our compute cluster, we are forced to build a filesystem on ESS using --version flag. Thanks, Damir On Wed, Mar 16, 2016 at 1:47 PM Jonathan Buzzard wrote: > On 16/03/16 18:07, Damir Krstic wrote: > > Sven, > > > > For us, at least, at this point in time, we have to create new > > filesystem with version flag. The reason is we can't take downtime to > > upgrade all of our 500+ compute nodes that will cross-cluster mount this > > new storage. We can take downtime in June and get all of the nodes up to > > 4.2 gpfs version but we have users today that need to start using the > > filesystem. > > > > You can upgrade a GPFS file system piece meal. That is there should be > no reason to take the whole system off-line to perform the upgrade. So > you can upgrade a compute nodes to GPFS 4.2 one by one and they will > happily continue to talk to the NSD's running 3.5 while the other nodes > continue to use the file system. > > In a properly designed GPFS cluster you should also be able to take > individual NSD nodes out for the upgrade. Though I wouldn't recommend > running mixed versions on a long term basis, it is definitely fine for > the purposes of upgrading. > > Then once all nodes in the GPFS cluster are upgraded you issue the > mmchfs -V full. How long this will take will depend on the maximum run > time you allow for your jobs. > > You would need to check that you can make a clean jump from 3.5 to 4.2 > but IBM support should be able to confirm that for you. > > This is one of the nicer features of GPFS; its what I refer to as > "proper enterprise big iron computing". That is if you have to take the > service down at any time for any reason you are doing it wrong. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:29:17 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:29:17 -0800 Subject: [gpfsug-discuss] cross-cluster mounting different versionsofgpfs In-Reply-To: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Message-ID: <201603161929.u2GJTRRf020013@d03av01.boulder.ibm.com> There are two related, but distinctly different issues to consider. 1) File system format and backward compatibility. The format of a given file system is recorded on disk, and determines the level of code required to mount such a file system. GPFS offers backward compatibility for older file system versions stretching for many releases. The oldest file system format we test with in the lab is 2.2 (we don't believe there are file systems using older versions actually present in the field). So if you have a file system formatted using GPFS V3.5 code, you can mount that file system using GPFS V4.1 or V4.2 without a problem. Of course, you don't get to use the new features that depend on the file system format that came out since V3.5. If you're formatting a new file system on a cluster running newer code, but want that file system to be mountable by older code, you have to use --version with mmcrfs. 2) RPC format compatibility, aka nodes being able to talk to each other. As the code evolves, the format of some RPCs sent over the network to other nodes naturally has to evolve as well. This of course presents a major problem for code coexistence (running different versions of GPFS on different nodes in the same cluster, or nodes from different clusters mounting the same file system, which effectively means joining a remote cluster), which directly translates into the possibility of a rolling migration (upgrading nodes to newer GPFS level one at a time, without taking all nodes down). Implementing new features while preserving some level of RPC compatibility with older releases is Hard, but this is something GPFS has committed to, long ago. The commitment is not open-ended though, there's a very specific statement of support for what's allowed. GPFS major (meaning 'v' or 'r' is incremented in a v.r.m.f version string) release N stream shall have coexistence with the GPFS major release N - 1 stream. So coexistence of V4.2 with V4.1 is supported, while coexistence of V4.2 with older releases is unsupported (it may or may not work if one tries it, depending on the specific combination of versions, but one would do so entirely on own risk). The reason for limiting the extent of RPC compatibility is prosaic: in order to support something, we have to be able to test this something. We have the resources to test the N / N - 1 combination, for every major release N. If we had to extend this to N, N - 1, N - 2, N - 3, you can do the math on how many combinations to test that would create. That would bust the test budget. So if you want to cross-mount a file system from a home cluster running V4.2, you have to run at least V4.1.x on client nodes, and the file system would have to be formatted using the lowest version used on any node mounting the file system. Hope this clarifies things a bit. yuri From: "Uwe Falke" To: gpfsug main discussion list , Date: 03/16/2016 11:52 AM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 16 19:37:31 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 14:37:31 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> Message-ID: <56E9B5FB.2050105@genome.wustl.edu> any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an > error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != > libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to > many package dependencies. Get those setup right and the installer > does a good job. > > If you go through and define your cluster nodes to the installer, you > can do a GPFS upgrade that way. I?ve run into some issues, especially > with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a > whole lot of control over what it does ? give it a ty and it may work > well for you. But run it in a test cluster first or on a limited set > of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: > on behalf of Matt > Weil > > Reply-To: gpfsug main discussion list > > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:37:53 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:37:53 -0800 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: <201603161937.u2GJbwII007184@d03av04.boulder.ibm.com> The 'mmdf' part of the usage string is actually an error, it should actually say 'df'. More specifically, this changes the semantics of statfs (2). On Linux, the statfs syscall takes a path argument, which can be the root directory of a file system, or a subdirectory inside. If the path happens to be a root directory of a fileset, and that fileset has the fileset quota set, and --filesetdf is set to 'yes', the statfs returns utilization numbers based on the fileset quota utilization, as opposed to the overall file system utilization. This is useful when a specific fileset is NFS-exported as a 'share', and it's desirable to see only the space used/available for that 'share' on the NFS client side. yuri From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" , Date: 03/16/2016 09:05 AM Subject: [gpfsug-discuss] Perfileset df explanation Sent by: gpfsug-discuss-bounces at spectrumscale.org All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 19:45:35 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 19:45:35 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: <56E9B7DF.1020007@buzzard.me.uk> On 16/03/16 19:06, Damir Krstic wrote: [SNIP] > > In our case, however, even though we will upgrade our clients to 4.2 > (some gradually as pointed elsewhere in this conversation, and most in > June), we have to be able to mount the new ESS filesystem on our compute > cluster before the clients are upgraded. What is preventing a gradual if not rapid upgrade of the compute clients now? The usual approach is once you have verified the upgrade is to simply to disable the queues on all the nodes and as jobs finish you upgrade them as they become free. Again because the usual approach is to have a maximum run time for jobs (that is jobs can't just run forever and will be culled if they run too long) you can achieve this piece meal upgrade in a relatively short period of time. Most places have a maximum run time of one to two weeks. So if you are within the norm this could be done by the end of the month. It's basically the same procedure as you would use to say push a security update that required a reboot. The really neat way is to script it up and then make it a job that you keep dumping in the queue till all nodes are updated :D > > It seems like, even though Sven is recommending against it, building a > filesystem with --version flag is our only option. I guess we have > another option, and that is to upgrade all our clients first, but we > can't do that until June so I guess it's really not an option at this time. > I would add my voice to that. The "this feature is not available because you created the file system as version x.y.z" is likely to cause you problems at some point down the line. Certainly caused me headaches in the past. > I hope this makes our constraints clear: mainly, without being able to > take downtime on our compute cluster, we are forced to build a > filesystem on ESS using --version flag. > Again there is or at least should not be *ANY* requirement for downtime of the compute cluster that the users will notice. Certainly nothing worse that nodes going down due to hardware failures or pushing urgent security patches. Taking a different tack is it not possible for the ESS storage to be added to the existing files system? That is you get a bunch of NSD's on the disk with NSD servers, add them all to the existing cluster and then issue some "mmchdisk suspend" on the existing disks followed by some "mmdeldisk " and have the whole lot move over to the new storage in an a manner utterly transparent to the end users (well other than a performance impact)? This approach certainly works (done it myself) but IBM might have placed restrictions on the ESS offering preventing you doing this while maintaining support that I am not familiar with. If there is I personally would see this a barrier to purchase of ESS but then I am old school when it comes to GPFS and not at all familiar with ESS. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Wed Mar 16 19:51:59 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 19:51:59 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9B5FB.2050105@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com>, <56E9B5FB.2050105@genome.wustl.edu> Message-ID: Have you got a half updated system maybe? You cant have: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 I.e. 0.7.3-5 and 0.7.5-4 I cant check right now, but are ibm shipping libcap-Ng as part of their package? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] Sent: 16 March 2016 19:37 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer any help here? ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 Error: Multilib version problems found. This often means that the root cause is something else and multilib version checking is just pointing out that there is a problem. Eg.: 1. You have an upgrade for libcap-ng which is missing some dependency that another package requires. Yum is trying to solve this by installing an older version of libcap-ng of the different architecture. If you exclude the bad architecture yum will tell you what the root cause is (which package requires what). You can try redoing the upgrade with --exclude libcap-ng.otherarch ... this should give you an error message showing the root cause of the problem. 2. You have multiple architectures of libcap-ng installed, but yum can only see an upgrade for one of those architectures. If you don't want/need both architectures anymore then you can remove the one with the missing update and everything will work. 3. You have duplicate versions of libcap-ng installed already. You can use "yum check" to get yum show these errors. ...you can also use --setopt=protected_multilib=false to remove this checking, however this is almost never the correct thing to do as something else is very likely to go wrong (often causing much more problems). Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From volobuev at us.ibm.com Wed Mar 16 20:03:09 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 12:03:09 -0800 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so > I never felt confident enough to deploy over the larger 4000+ clients > cluster. Back then, GPFS code had a deficiency: migrating very large files didn't work well with snapshots (and some operation mm commands). In order to create a snapshot, we have to have the file system in a consistent state for a moment, and we get there by performing a "quiesce" operation. This is done by flushing all dirty buffers to disk, stopping any new incoming file system operations at the gates, and waiting for all in-flight operations to finish. This works well when all in-flight operations actually finish reasonably quickly. That assumption was broken if an external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very large file, e.g. to migrate the file's blocks to a different storage pool. The quiesce operation would need to wait for that API call to finish, as it's an in-flight operation, but migrating a multi-TB file could take a while, and during this time all new file system ops would be blocked. This was solved several years ago by changing the API and its callers to do the migration one block range at a time, thus making each individual syscall short and allowing quiesce to barge in and do its thing. All currently supported levels of GPFS have this fix. I believe mmbackup was affected by the same GPFS deficiency and benefited from the same fix. yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Mar 16 20:20:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 16 Mar 2016 16:20:21 -0400 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> Message-ID: <20160316162021.57513mzxykk7semd@support.scinet.utoronto.ca> OK, that is good to know. I'll give it a try with snapshot then. We already have 3.5 almost everywhere, and planing for 4.2 upgrade (reading the posts with interest) Thanks Jaime Quoting Yuri L Volobuev : > >> Under both 3.2 and 3.3 mmbackup would always lock up our cluster when >> using snapshot. I never understood the behavior without snapshot, and >> the lock up was intermittent in the carved-out small test cluster, so >> I never felt confident enough to deploy over the larger 4000+ clients >> cluster. > > Back then, GPFS code had a deficiency: migrating very large files didn't > work well with snapshots (and some operation mm commands). In order to > create a snapshot, we have to have the file system in a consistent state > for a moment, and we get there by performing a "quiesce" operation. This > is done by flushing all dirty buffers to disk, stopping any new incoming > file system operations at the gates, and waiting for all in-flight > operations to finish. This works well when all in-flight operations > actually finish reasonably quickly. That assumption was broken if an > external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very > large file, e.g. to migrate the file's blocks to a different storage pool. > The quiesce operation would need to wait for that API call to finish, as > it's an in-flight operation, but migrating a multi-TB file could take a > while, and during this time all new file system ops would be blocked. This > was solved several years ago by changing the API and its callers to do the > migration one block range at a time, thus making each individual syscall > short and allowing quiesce to barge in and do its thing. All currently > supported levels of GPFS have this fix. I believe mmbackup was affected by > the same GPFS deficiency and benefited from the same fix. > > yuri > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From duersch at us.ibm.com Wed Mar 16 20:25:23 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 16 Mar 2016 16:25:23 -0400 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Please see question 2.10 in our faq. http://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/STXKQY/gpfsclustersfaq.pdf We only support clusters that are running release n and release n-1 and release n+1. So 4.1 is supported to work with 3.5 and 4.2. Release 4.2 is supported to work with 4.1, but not with gpfs 3.5. It may indeed work, but it is not supported. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York >>Message: 1 >>Date: Wed, 16 Mar 2016 18:07:59 +0000 >>From: Damir Krstic >>To: gpfsug main discussion list >>Subject: Re: [gpfsug-discuss] cross-cluster mounting different >> versions of gpfs >>Message-ID: >> >>Content-Type: text/plain; charset="utf-8" >> >>Sven, >> >>For us, at least, at this point in time, we have to create new filesystem >>with version flag. The reason is we can't take downtime to upgrade all of >>our 500+ compute nodes that will cross-cluster mount this new storage. We >>can take downtime in June and get all of the nodes up to 4.2 gpfs version >>but we have users today that need to start using the filesystem. >> >>So at this point in time, we either have ESS built with 4.1 version and >>cross mount its filesystem (also built with --version flag I assume) to our >>3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems >>with --version flag and then in June when we get all of our clients upgrade >>we run =latest gpfs command and then mmchfs -V to get filesystem back up to >>4.2 features. >> >>It's unfortunate that we are in this bind with the downtime of the compute >>cluster. If we were allowed to upgrade our compute nodes before June, we >>could proceed with 4.2 build without having to worry about filesystem >>versions. >> >>Thanks for your reply. >> >>Damir From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/16/2016 02:08 PM Subject: gpfsug-discuss Digest, Vol 50, Issue 47 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: cross-cluster mounting different versions of gpfs (Damir Krstic) ---------------------------------------------------------------------- Message: 1 Date: Wed, 16 Mar 2016 18:07:59 +0000 From: Damir Krstic To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: Content-Type: text/plain; charset="utf-8" Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 47 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 21:52:34 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 16:52:34 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com><56E9AA2A.3010108@buzzard.me.uk> Message-ID: <201603162152.u2GLqfvD032745@d03av03.boulder.ibm.com> Considering the last few appends from Yuri and Sven, it seems you might want to (re)consider using Samba and/or NFS... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Thu Mar 17 11:14:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 17 Mar 2016 11:14:03 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: (Sorry, just found this in drafts, thought I'd sent it yesterday!) Cheers Luke. Sorry, I wasn't actually wanting to get over-provisioning stats (although it would be great!) just that I thought that might be what it does. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Luke Raimbach Sent: 16 March 2016 16:06 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Mar 17 16:03:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 17 Mar 2016 16:03:59 +0000 Subject: [gpfsug-discuss] Experiences with Alluxio/Tachyon ? Message-ID: <18C8D317-16BE-4351-AD8D-0E165FB60511@nuance.com> Anyone have experience with Alluxio? http://www.alluxio.org/ Also http://ibmresearchnews.blogspot.com/2015/08/tachyon-for-ultra-fast-big-data.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 18 16:39:42 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:39:42 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> Message-ID: <56EC2F4E.6010203@genome.wustl.edu> upgrading to 4.2.2 fixed the dependency issue. I now get Unable to access CES shared root. # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' cesSharedRoot /vol/system On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: > Have you got a half updated system maybe? > > You cant have: > libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > I.e. 0.7.3-5 and 0.7.5-4 > > I cant check right now, but are ibm shipping libcap-Ng as part of their package? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] > Sent: 16 March 2016 19:37 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > > On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. > > If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From mweil at genome.wustl.edu Fri Mar 18 16:54:51 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:54:51 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56EC2F4E.6010203@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> <56EC2F4E.6010203@genome.wustl.edu> Message-ID: <56EC32DB.1000108@genome.wustl.edu> Fri Mar 18 11:50:43 CDT 2016: mmcesop: /vol/system/ found but is not on a GPFS filesystem On 3/18/16 11:39 AM, Matt Weil wrote: > upgrading to 4.2.2 fixed the dependency issue. I now get Unable to > access CES shared root. > > # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' > cesSharedRoot /vol/system > > On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: >> Have you got a half updated system maybe? >> >> You cant have: >> libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> I.e. 0.7.3-5 and 0.7.5-4 >> >> I cant check right now, but are ibm shipping libcap-Ng as part of their package? >> >> Simon >> ________________________________________ >> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] >> Sent: 16 March 2016 19:37 >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> any help here? >> ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 >> Error: Multilib version problems found. This often means that the root >> cause is something else and multilib version checking is just >> pointing out that there is a problem. Eg.: >> >> 1. You have an upgrade for libcap-ng which is missing some >> dependency that another package requires. Yum is trying to >> solve this by installing an older version of libcap-ng of the >> different architecture. If you exclude the bad architecture >> yum will tell you what the root cause is (which package >> requires what). You can try redoing the upgrade with >> --exclude libcap-ng.otherarch ... this should give you an error >> message showing the root cause of the problem. >> >> 2. You have multiple architectures of libcap-ng installed, but >> yum can only see an upgrade for one of those architectures. >> If you don't want/need both architectures anymore then you >> can remove the one with the missing update and everything >> will work. >> >> 3. You have duplicate versions of libcap-ng installed already. >> You can use "yum check" to get yum show these errors. >> >> ...you can also use --setopt=protected_multilib=false to remove >> this checking, however this is almost never the correct thing to >> do as something else is very likely to go wrong (often causing >> much more problems). >> >> Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> >> On 3/16/16 12:40 PM, Oesterlin, Robert wrote: >> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. >> >> If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. >> >> Bob Oesterlin >> Sr Storage Engineer, Nuance HPC Grid >> 507-269-0413 >> >> >> From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > >> Reply-To: gpfsug main discussion list > >> Date: Wednesday, March 16, 2016 at 12:36 PM >> To: "gpfsug-discuss at spectrumscale.org" > >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> We have multiple clusters with thousands of nsd's surely there is an >> upgrade path. Are you all saying just continue to manually update nsd >> servers and manage them as we did previously. Is the installer not >> needed if there are current setups. Just deploy CES manually? >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From martin.gasthuber at desy.de Tue Mar 22 09:45:30 2016 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 22 Mar 2016 10:45:30 +0100 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server Message-ID: Hi, we're looking for a powerful (and cost efficient) machine config to optimally support the new CES services, especially Ganesha. In more detail, we're wondering if somebody has already got some experience running these services on machines with HAWC and/or LROC enabled HW, resulting in a clearer understanding of the benefits of that config. We will have ~300 client boxes accessing GPFS via NFS and planning for 2 nodes initially. best regards, Martin From S.J.Thompson at bham.ac.uk Tue Mar 22 10:05:05 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 22 Mar 2016 10:05:05 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Tue Mar 22 12:44:57 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Tue, 22 Mar 2016 12:44:57 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: <4eec1651b22f40418104a5a44f424b8d@mbxtoa1.winmail.deshaw.com> It's worth sharing that we have seen two problems with CES providing NFS via ganesha in a similar deployment: 1. multicluster cache invalidation: ganesha's FSAL upcall for invalidation of its file descriptor cache by GPFS doesn't appear to work for remote GPFS filesystems. As mentioned by Simon, this is unsupported, though the problem can be worked around with some effort though by disabling ganesha's FD cache entirely. 2. Readdir bad cookie bug: an interaction we're still providing info to IBM about between certain linux NFS clients and ganesha in which readdir calls may sporadically return empty results for directories containing files, without any corresponding error result code. Given our multicluster requirements and the problems associated with the readdir bug, we've reverted to using CNFS for now. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, March 22, 2016 6:05 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC/LROC in Ganesha server Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" > wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Wed Mar 23 11:31:45 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 23 Mar 2016 11:31:45 +0000 Subject: [gpfsug-discuss] Places are filling up fast! Message-ID: <50eb8657d660d1c8d7714a14b6d69864@webmail.gpfsug.org> Dear members, We've had a fantastic response to the registrations for the next meeting in May. So good in fact that there are only 22 spaces left! If you are thinking of attending I would recommend doing so as soon as you can to avoid missing out. The link to register is: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 [1] Also, we really like to hear from members on their experiences and are looking for volunteers for a short 15-20 minute presentation on their Spectrum Scale/GPFS installation, the highs and lows of it! If you're interested, please let Simon (chair at spectrumscaleug.org) or I know. Thanks and we look forward to seeing you in May. Claire -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Links: ------ [1] http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.finnerman at load.se Tue Mar 29 23:04:26 2016 From: jan.finnerman at load.se (Jan Finnerman Load) Date: Tue, 29 Mar 2016 22:04:26 +0000 Subject: [gpfsug-discuss] Joined GPFS alias Message-ID: Hi All, I just joined the alias and want to give this short introduction of myself in GPFS terms. I work as a consultant at Load System, an IBM Business Partner based in Sweden. We work mainly in the Media and Finance markets. I support and do installs of GPFS at two customers in the media market in Sweden. Currently, I?m involved in a new customer install with Spectrum Scale 4.2/Red Hat 7.1/PowerKVM/Power 8. This is a customer in south of Sweden that do scientific research in Physics on Elementary Particles. My office location is Kista outside of Stockholm in Sweden. Brgds ///Jan [cid:7674672D-7E3F-417F-96F9-89737A1F6AEE] Jan Finnerman Senior Technical consultant [CertTiv_sm] [cid:4D49557E-099B-4799-AD7E-0A103EB45735] Kista Science Tower 164 51 Kista Mobil: +46 (0)70 631 66 26 Kontor: +46 (0)8 633 66 00/26 jan.finnerman at load.se -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png Type: image/png Size: 5565 bytes Desc: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png Type: image/png Size: 8584 bytes Desc: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CertPowerSystems_sm[1][9].png Type: image/png Size: 6664 bytes Desc: CertPowerSystems_sm[1][9].png URL: From Luke.Raimbach at crick.ac.uk Tue Mar 1 12:43:54 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 1 Mar 2016 12:43:54 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS Message-ID: HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 16:22:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 16:22:35 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Mar 2 16:27:24 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:27:24 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From S.J.Thompson at bham.ac.uk Wed Mar 2 16:29:34 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 16:29:34 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Message-ID: I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon From oehmes at us.ibm.com Wed Mar 2 16:31:12 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 2 Mar 2016 08:31:12 -0800 Subject: [gpfsug-discuss] IBM-Sandisk Announcement In-Reply-To: References: Message-ID: <201603021631.u22GVTh9003605@d03av04.boulder.ibm.com> its direct SAS attached . ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 03/02/2016 08:27 AM Subject: Re: [gpfsug-discuss] IBM-Sandisk Announcement Sent by: gpfsug-discuss-bounces at spectrumscale.org There's a bit more at: http://www.theregister.co.uk/2016/03/02/ibm_adds_sandisk_flash_colour_to_its_storage_spectrum/ When I looks as infiniflash briefly it appeared to be ip presented, so guess something like and Linux based system in the "controller". So I guess they have installed gpfs in there as part of the appliance. It doesn't appear to be available as block storage/fc attached from what I could see. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 02 March 2016 16:22 To: gpfsug main discussion list Subject: [gpfsug-discuss] IBM-Sandisk Announcement Anyone from the IBM side that can comment on this in more detail? (OK if you email me directly) Article is thin on exactly what?s being announced. SanDisk Corporation, a global leader in flash storage solutions, and IBM today announced a collaboration to bring out a unique class of next-generation, software-defined, all-flash storage solutions for the data center. At the core of this collaboration are SanDisk?s InfiniFlash System?a high-capacity and extreme-performance flash-based software defined storage system featuring IBM Spectrum Scale filesystem from IBM. https://www.sandisk.com/about/media-center/press-releases/2016/sandisk-and-ibm-collaborate-to-deliver-software-defined-all-flash-storage-solutions Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Luke.Raimbach at crick.ac.uk Wed Mar 2 16:43:17 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 2 Mar 2016 16:43:17 +0000 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From Robert.Oesterlin at nuance.com Wed Mar 2 17:04:57 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 2 Mar 2016 17:04:57 +0000 Subject: [gpfsug-discuss] IBM-Sandisk Announcement Message-ID: <37CDF3CF-53AD-45FC-8E0C-582CED5DD99F@nuance.com> The reason I?m asking is that I?m doing a test with an IF100 box, and wanted to know what the IBM plans were for it :-) Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVqeu026048@d03av04.boulder.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From dhildeb at us.ibm.com Wed Mar 2 17:23:30 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Wed, 2 Mar 2016 09:23:30 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603021731.u22HVuTl015056@d01av01.pok.ibm.com> Hi Luke, Assuming the network between your clusters is reliable, using GPFS with SW-mode (also assuming you aren't ever modifying the data on the home cluster) should work well for you I think. New files can continue to be created in the cache even in unmounted state.... Dean IBM Almaden Research Center From: Luke Raimbach To: gpfsug main discussion list Date: 03/01/2016 04:44 AM Subject: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org HI All, We have two clusters and are using AFM between them to compartmentalise performance. We have the opportunity to run AFM over GPFS protocol (over IB verbs), which I would imagine gives much greater performance than trying to push it over NFS over Ethernet. We will have a whole raft of instrument ingest filesets in one storage cluster which are single-writer caches of the final destination in the analytics cluster. My slight concern with running this relationship over native GPFS is that if the analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the manual which says: "In the case of caches based on native GPFS? protocol, unavailability of the home file system on the cache cluster puts the caches into unmounted state. These caches never enter the disconnected state. For AFM filesets that use GPFS protocol to connect to the home cluster, if the remote mount becomes unresponsive due to issues at the home cluster not related to disconnection (such as a deadlock), operations that require remote mount access such as revalidation or reading un-cached contents also hang until remote mount becomes available again. One way to continue accessing all cached contents without disruption is to temporarily disable all the revalidation intervals until the home mount is accessible again." What I'm unsure of is whether this applies to single-writer caches as they (presumably) never do revalidation. We don't want instrument data capture to be interrupted on our ingest storage cluster if the analytics cluster goes away. Is anyone able to clear this up, please? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 2 19:46:48 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:46:48 -0600 Subject: [gpfsug-discuss] cpu shielding Message-ID: <56D74328.50507@genome.wustl.edu> All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From bbanister at jumptrading.com Wed Mar 2 19:49:50 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 19:49:50 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D74328.50507@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil Sent: Wednesday, March 02, 2016 1:47 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cpu shielding All, We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 19:54:21 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 13:54:21 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D744ED.30307@genome.wustl.edu> Can you share anything more? We are trying all system related items on cpu0 GPFS is on cpu1 and the rest are used for the lsf scheduler. With that setup we still see evictions. Thanks Matt On 3/2/16 1:49 PM, Bryan Banister wrote: > We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. > -B > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil > Sent: Wednesday, March 02, 2016 1:47 PM > To: gpfsug main discussion list > Subject: [gpfsug-discuss] cpu shielding > > All, > > We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? > > Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. > > Thanks > > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From viccornell at gmail.com Wed Mar 2 20:15:16 2016 From: viccornell at gmail.com (viccornell at gmail.com) Date: Wed, 2 Mar 2016 21:15:16 +0100 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <56D744ED.30307@genome.wustl.edu> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Wed Mar 2 20:17:38 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 2 Mar 2016 20:17:38 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From mweil at genome.wustl.edu Wed Mar 2 20:22:05 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:22:05 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> Message-ID: <56D74B6D.8050802@genome.wustl.edu> On 3/2/16 2:15 PM, viccornell at gmail.com wrote: > Hi, > > How sure are you that it is cpu scheduling that is your problem? just spotted this maybe it can help spot something. https://software.intel.com/en-us/articles/intel-performance-counter-monitor > > Are you using IB or Ethernet? two 10 gig Intel nics in a LACP bond. links are not saturated. > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From S.J.Thompson at bham.ac.uk Wed Mar 2 20:24:44 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 2 Mar 2016 20:24:44 +0000 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> , <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] Sent: 02 March 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com Sent: Wednesday, March 02, 2016 2:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cpu shielding Hi, How sure are you that it is cpu scheduling that is your problem? Are you using IB or Ethernet? I have seen problems that look like yours in the past with single-network Ethernet setups. Regards, Vic Sent from my iPhone > On 2 Mar 2016, at 20:54, Matt Weil wrote: > > Can you share anything more? > We are trying all system related items on cpu0 GPFS is on cpu1 and the > rest are used for the lsf scheduler. With that setup we still see > evictions. > > Thanks > Matt > >> On 3/2/16 1:49 PM, Bryan Banister wrote: >> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >> -B >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >> Weil >> Sent: Wednesday, March 02, 2016 1:47 PM >> To: gpfsug main discussion list >> Subject: [gpfsug-discuss] cpu shielding >> >> All, >> >> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >> >> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >> >> Thanks >> >> Matt >> >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at genome.wustl.edu Wed Mar 2 20:47:24 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 2 Mar 2016 14:47:24 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D7515C.4070102@genome.wustl.edu> GPFS client version 3.5.0-15 any related issues there with timeouts? On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Greg.Lehmann at csiro.au Wed Mar 2 22:48:51 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 2 Mar 2016 22:48:51 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <304dd806ce6e4488b163676bb5889da2@exch2-mel.nexus.csiro.au> Sitting next to 2 DDN guys doing some gridscaler training. Their opinion is "pure FUD". They are happy for us to run IBM or their Spectrum Scale packages in the DDN hardware. Cheers, Greg -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Thursday, 3 March 2016 2:30 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS vs Spectrum Scale I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From daniel.kidger at uk.ibm.com Wed Mar 2 22:52:55 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 2 Mar 2016 22:52:55 +0000 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale In-Reply-To: References: Message-ID: <201603022153.u22Lr0nY015961@d06av10.portsmouth.uk.ibm.com> I work for IBM and in particular support OEMs and other Business Partners I am not sure if Simon is using try true IBM speak here as any OEM purchase of Spectrum Scale inherently has tin included, be it from DDN, Seagate, Lenovo, etc. Remember there are 4 main ways to buy Spectrum Scale: 1. as pure software, direct from IBM or though a business partner. 2. as part of a hardware offering from an OEM 3. as part of a hardware offering from IBM. This is what ESS is. 4. as a cloud service in Softlayer. Spectrum Scale (GPFS) is exactly the same software no matter which route above is used to purchase it. What OEMs do do, as IBM do with their ESS appliance product is do extra validation to confirm that the newest release is fully compatible with their hardware solution and has no regressions in performance or otherwise. Hence there is often perhaps 3 months between say the 4.2 official release and when it appears in OEM solutions. ESS is the same here. The two difference to note that make #2 OEM systems different are though are: 1: When bought as part of an OEM through say Lenovo, DDN or Seagate then that OEM owns the actual GFPS licenses rather than the end customer. The practical side of this is that if you later replace the hardware with a different vendors hardware there is no automatic right to transfer over the old licenses, as would be the case if GPFS was bought directly from IBM/ 2. When bought as part of an OEM system, then that OEM is the sole point of contact for the customer for all support. The customer does not first have to triage if it is a hw or sw issue. The OEM in return provides 1st and 2nd line support to the customer, and only escalates in-depth level 3 support issues to IBM's development team. The OEMs then will have gone though extensive training to be able to do such 1st and 2nd line support. (Of course many traditional IBM Business Partners are also very clued up about helping their customers directly.) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 02/03/2016 16:30 Subject: [gpfsug-discuss] GPFS vs Spectrum Scale Sent by: gpfsug-discuss-bounces at spectrumscale.org I had a slightly strange discussion with IBM this morning... We typically buy OEM GPFS with out tin. The discussion went along the lines that spectrum scale is different somehow from gpfs via the oem route. Is this just a marketing thing? Red herring? Or is there something more to this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZNwQ032425@d03av04.boulder.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 00:35:18 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 2 Mar 2016 16:35:18 -0800 Subject: [gpfsug-discuss] AFM over NFS vs GPFS In-Reply-To: References: Message-ID: <201603030035.u230ZMCU018632@d01av01.pok.ibm.com> Going way off topic... For reasons that are not entirely understood, Spectrum Scale AFM developers who work from India are unable to subscribe to the gpfsug-discuss mailing list. Their mail servers and gpfsug servers don't want to play nice together. So if you want to reach more AFM experts, I recommend going the developerWorks GPFS forum route: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479&ps=25 yuri From: Luke Raimbach To: gpfsug main discussion list , "gpfsug main discussion list" , Date: 03/02/2016 08:43 AM Subject: Re: [gpfsug-discuss] AFM over NFS vs GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Anybody know the answer? > HI All, > > We have two clusters and are using AFM between them to compartmentalise > performance. We have the opportunity to run AFM over GPFS protocol (over IB > verbs), which I would imagine gives much greater performance than trying to > push it over NFS over Ethernet. > > We will have a whole raft of instrument ingest filesets in one storage cluster > which are single-writer caches of the final destination in the analytics cluster. > My slight concern with running this relationship over native GPFS is that if the > analytics cluster goes offline (e.g. for maintenance, etc.), there is an entry in the > manual which says: > > "In the case of caches based on native GPFS? protocol, unavailability of the > home file system on the cache cluster puts the caches into unmounted state. > These caches never enter the disconnected state. For AFM filesets that use GPFS > protocol to connect to the home cluster, if the remote mount becomes > unresponsive due to issues at the home cluster not related to disconnection > (such as a deadlock), operations that require remote mount access such as > revalidation or reading un-cached contents also hang until remote mount > becomes available again. One way to continue accessing all cached contents > without disruption is to temporarily disable all the revalidation intervals until the > home mount is accessible again." > > What I'm unsure of is whether this applies to single-writer caches as they > (presumably) never do revalidation. We don't want instrument data capture to > be interrupted on our ingest storage cluster if the analytics cluster goes away. > > Is anyone able to clear this up, please? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs > Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales > no. 1140062 and a company registered in England and Wales no. 06885462, with > its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Thu Mar 3 09:07:25 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 3 Mar 2016 09:07:25 +0000 Subject: [gpfsug-discuss] Cloning across fileset boundaries Message-ID: Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDobP010703@d03av04.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Thu Mar 3 18:13:45 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Thu, 3 Mar 2016 10:13:45 -0800 Subject: [gpfsug-discuss] Cloning across fileset boundaries In-Reply-To: References: Message-ID: <201603031813.u23IDplG002884@d03av01.boulder.ibm.com> This is technically impossible. A clone relationship is semantically similar to a hard link. The basic fileset concept precludes hard links between filesets. A fileset is by definition a self-contained subtree in the namespace. yuri From: Luke Raimbach To: gpfsug main discussion list , Date: 03/03/2016 01:07 AM Subject: [gpfsug-discuss] Cloning across fileset boundaries Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, When I use "mmclone copy" to try and create a clone and the destination is inside a fileset (dependent or independent), I get this: mmclone: Invalid cross-device link I can find no information in any manuals as to why this doesn't work (though I can imagine what the reasons might be). Could somebody explain whether this could be permitted in the future, or if it's technically impossible? Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Thu Mar 3 21:57:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 3 Mar 2016 21:57:20 +0000 Subject: [gpfsug-discuss] Small cluster Message-ID: I have a client that wants to build small remote sites to sync back to an ESS cluster they purchased. These remote sites are generally <15-20TB. If I build a three node cluster with just internal drives can this work if the drives aren?t shared amongst the cluster without FPO or GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with the three nodes? Assuming all are NSD servers (or two at least). Seems like most of the implementations I?m seeing use shared disks so local drives only would be an odd architecture right? What do I give up by not having shared disks seen by other NSD servers? Mark Bush Storage Architect This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Thu Mar 3 22:23:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 3 Mar 2016 22:23:08 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: Message-ID: <56D8B94C.2000303@buzzard.me.uk> On 03/03/16 21:57, Mark.Bush at siriuscom.com wrote: > I have a client that wants to build small remote sites to sync back to > an ESS cluster they purchased. These remote sites are generally > <15-20TB. If I build a three node cluster with just internal drives can > this work if the drives aren?t shared amongst the cluster without FPO or > GNR(since it?s not ESS)? Is it better to have a SAN sharing disks with > the three nodes? Assuming all are NSD servers (or two at least). Seems > like most of the implementations I?m seeing use shared disks so local > drives only would be an odd architecture right? What do I give up by > not having shared disks seen by other NSD servers? > Unless you are doing data and metadata replication on the remote sites then any one server going down is not good at all. To be honest I have only ever seen that sort of setup done once. It was part of a high availability web server system. The idea was GPFS provided the shared storage between the nodes by replicating everything. Suffice as to say keeping things polite "don't do that". In reality the swear words coming from the admin trying to get GPFS fixed when disks failed where a lot more colourful. In the end the system was abandoned and migrated to ESX as it was back then. Mind you that was in the days of GPFS 2.3 so it *might* be better now; are you feeling lucky? However a SAS attached Dell MD3 (it's LSI/Netgear Engenio storage so basically the same as a DS3000/4000/5000) is frankly so cheap that it's just not worth going down that route if you ask me. I would do a two server cluster with a tie breaker disk on the MD3 to avoid any split brain issues, and use the saving on the third server to buy the MD3 and SAS cards. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From makaplan at us.ibm.com Fri Mar 4 16:09:03 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 4 Mar 2016 11:09:03 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <56D8B94C.2000303@buzzard.me.uk> References: <56D8B94C.2000303@buzzard.me.uk> Message-ID: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri Mar 4 16:21:20 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:21:20 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:26:15 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:26:15 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > I guess this is really my question. Budget is less than $50k per site and > they need around 20TB storage. Two nodes with MD3 or something may work. > But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > From: on behalf of Marc A > Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution more > difficult. > > To begin with, as with any provisioning problem, one should try to state: > requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 4 16:28:52 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 04 Mar 2016 16:28:52 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> Message-ID: <1457108932.4251.183.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-04 at 11:09 -0500, Marc A Kaplan wrote: > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the > GPFS 2.3 era. Once bitten twice shy. I was offering my experience of that setup, which is not good. I my defense I did note it was it the 2.x era and it might be better now. > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. The other thing I would point out is that replacing a disk in a MD3 or similar is an operator level procedure. Replacing a similar disk up the front with GPFS replication requires a skilled GPFS administrator. Given these are to be on remote sites, I would suspect simpler lower skilled maintenance is better. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Fri Mar 4 16:30:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Fri, 4 Mar 2016 16:30:41 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> Message-ID: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Mar 4 16:36:30 2016 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 4 Mar 2016 11:36:30 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: > Yes. Really the only other option we have (and not a bad one) is getting > a v7000 Unified in there (if we can get the price down far enough). That?s > not a bad option since all they really want is SMB shares in the remote. I > just keep thinking a set of servers would do the trick and be cheaper. > > > > From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM > > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks below > the GPFS filesystem layer and looks like a normal GPFS system (mostly). I > do think there were some restrictions on non-FPO nodes mounting FPO > filesystems via multi-cluster.. not sure if those are still there.. any > input on that from IBM? > > If small enough data, and with 3-way replication, it might just be wise to > do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common > throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between 'science > experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < > Mark.Bush at siriuscom.com> wrote: > >> I guess this is really my question. Budget is less than $50k per site >> and they need around 20TB storage. Two nodes with MD3 or something may >> work. But could it work (and be successful) with just servers and internal >> drives? Should I do FPO for non hadoop like workloads? I didn?t think I >> could get native raid except in the ESS (GSS no longer exists if I remember >> correctly). Do I just make replicas and call it good? >> >> >> Mark >> >> From: on behalf of Marc A >> Kaplan >> Reply-To: gpfsug main discussion list >> Date: Friday, March 4, 2016 at 10:09 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Small cluster >> >> Jon, I don't doubt your experience, but it's not quite fair or even >> sensible to make a decision today based on what was available in the GPFS >> 2.3 era. >> >> We are now at GPFS 4.2 with support for 3 way replication and FPO. >> Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS >> solutions and more. >> >> So more choices, more options, making finding an "optimal" solution more >> difficult. >> >> To begin with, as with any provisioning problem, one should try to state: >> requirements, goals, budgets, constraints, failure/tolerance >> models/assumptions, >> expected workloads, desired performance, etc, etc. >> >> >> This message (including any attachments) is intended only for the use of >> the individual or entity to which it is addressed and may contain >> information that is non-public, proprietary, privileged, confidential, and >> exempt from disclosure under applicable law. If you are not the intended >> recipient, you are hereby notified that any use, dissemination, >> distribution, or copying of this communication is strictly prohibited. This >> message may be viewed by parties at Sirius Computer Solutions other than >> those named in the message header. This message does not contain an >> official representation of Sirius Computer Solutions. If you have received >> this communication in error, notify Sirius Computer Solutions immediately >> and (i) destroy this message if a facsimile or (ii) delete this message >> immediately if this is an electronic communication. Thank you. >> Sirius Computer Solutions >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > > -- > Zach Giles > zgiles at gmail.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 4 16:40:54 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 4 Mar 2016 10:40:54 -0600 Subject: [gpfsug-discuss] cpu shielding In-Reply-To: References: <56D74328.50507@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FEFF9D@CHI-EXCHANGEW1.w2k.jumptrading.com> <56D744ED.30307@genome.wustl.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB05FF010A@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <56D9BA96.8010800@genome.wustl.edu> All, This turned out to be processes copying data from GPFS to local /tmp. Once the system memory was full it started blocking while the data was being flushed to disk. This process was taking long enough to have leases expire. Matt On 3/2/16 2:24 PM, Simon Thompson (Research Computing - IT Services) wrote: > Vaguely related, we used to see the out of memory killer regularly go for mmfsd, which should kill user process and pbs_mom which ran from gpfs. > > We modified the gpfs init script to set the score for mmfsd for oom to help prevent this. (we also modified it to wait for ib to come up as well, need to revisit this now I guess as there is systemd support in 4.2.0.1 so we should be able to set a .wants there). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Bryan Banister [bbanister at jumptrading.com] > Sent: 02 March 2016 20:17 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > I would agree with Vic that in most cases the issues are with the underlying network communication. We are using the cgroups to mainly protect against runaway processes that attempt to consume all memory on the system, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of viccornell at gmail.com > Sent: Wednesday, March 02, 2016 2:15 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] cpu shielding > > Hi, > > How sure are you that it is cpu scheduling that is your problem? > > Are you using IB or Ethernet? > > I have seen problems that look like yours in the past with single-network Ethernet setups. > > Regards, > > Vic > > Sent from my iPhone > >> On 2 Mar 2016, at 20:54, Matt Weil wrote: >> >> Can you share anything more? >> We are trying all system related items on cpu0 GPFS is on cpu1 and the >> rest are used for the lsf scheduler. With that setup we still see >> evictions. >> >> Thanks >> Matt >> >>> On 3/2/16 1:49 PM, Bryan Banister wrote: >>> We do use cgroups to isolate user applications into a separate cgroup which provides some headroom of CPU and memory resources for the rest of the system services including GPFS and its required components such SSH, etc. >>> -B >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Matt >>> Weil >>> Sent: Wednesday, March 02, 2016 1:47 PM >>> To: gpfsug main discussion list >>> Subject: [gpfsug-discuss] cpu shielding >>> >>> All, >>> >>> We are seeing issues on our GPFS clients where mmfsd is not able to respond in time to renew its lease. Once that happens the file system is unmounted. We are experimenting with c groups to tie mmfsd and others to specified cpu's. Any recommendations out there on how to shield GPFS from other process? >>> >>> Our system design has all PCI going through the first socket and that seems to be some contention there as the RAID controller with SSD's and nics are on that same bus. >>> >>> Thanks >>> >>> Matt >>> >>> >>> ____ >>> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ____ >> This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Paul.Sanchez at deshaw.com Fri Mar 4 16:54:39 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 4 Mar 2016 16:54:39 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: You wouldn?t be alone in trying to make the ?concurrent CES gateway + NSD server nodes? formula work. That doesn?t mean it will be well-supported initially, but it does mean that others will be finding bugs and interaction issues along with you. On GPFS 4.1.1.2 for example, it?s possible to get a CES protocol node into a state where the mmcesmonitor is dead and requires a mmshutdown/mmstartup to recover from. Since in a shared-nothing disk topology that would require mmchdisk/mmrestripefs to recover and rebalance, it would be operationally intensive to run CES on an NSD server with local disks. With shared SAN disks, this becomes more tractable, in my opinion. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Zachary Giles Sent: Friday, March 04, 2016 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com > wrote: Yes. Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote. I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com > wrote: I guess this is really my question. Budget is less than $50k per site and they need around 20TB storage. Two nodes with MD3 or something may work. But could it work (and be successful) with just servers and internal drives? Should I do FPO for non hadoop like workloads? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly). Do I just make replicas and call it good? Mark From: > on behalf of Marc A Kaplan > Reply-To: gpfsug main discussion list > Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Fri Mar 4 18:03:16 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 4 Mar 2016 19:03:16 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> Message-ID: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From ceason at us.ibm.com Fri Mar 4 18:20:50 2016 From: ceason at us.ibm.com (Jeffrey M Ceason) Date: Fri, 4 Mar 2016 11:20:50 -0700 Subject: [gpfsug-discuss] Small cluster (Jeff Ceason) In-Reply-To: References: Message-ID: <201603041821.u24IL6S6000328@d01av02.pok.ibm.com> The V7000 Unified type system is made for this application. http://www-03.ibm.com/systems/storage/disk/storwize_v7000/ Jeff Ceason Solutions Architect (520) 268-2193 (Mobile) ceason at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/04/2016 11:15 AM Subject: gpfsug-discuss Digest, Vol 50, Issue 14 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Small cluster (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 4 Mar 2016 19:03:16 +0100 From: "Sven Oehme" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Message-ID: <201603041804.u24I4g2R026689 at d03av01.boulder.ibm.com> Content-Type: text/plain; charset="utf-8" Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough). That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly). I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160304/dd661d27/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 14 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From janfrode at tanso.net Sat Mar 5 13:16:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Sat, 05 Mar 2016 13:16:54 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : > Hi, > > a couple of comments to the various infos in this thread. > > 1. the need to run CES on separate nodes is a recommendation, not a > requirement and the recommendation comes from the fact that if you have > heavy loaded NAS traffic that gets the system to its knees, you can take > your NSD service down with you if its on the same box. so as long as you > have a reasonable performance expectation and size the system correct there > is no issue. > > 2. shared vs FPO vs shared nothing (just replication) . the main issue > people overlook in this scenario is the absence of read/write caches in FPO > or shared nothing configurations. every physical disk drive can only do > ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its > pretty much the same effort. particular on metadata this bites you really > badly as every of this tiny i/os eats one of your 100 iops a disk can do > and quickly you used up all your iops on the drives. if you have any form > of raid controller (sw or hw) it typically implements at minimum a read > cache on most systems a read/write cache which will significant increase > the number of logical i/os one can do against a disk , my best example is > always if you have a workload that does 4k seq DIO writes to a single disk, > if you have no raid controller you can do 400k/sec in this workload if you > have a reasonable ok write cache in front of the cache you can do 50 times > that much. so especilly if you use snapshots, CES services or anything > thats metadata intensive you want some type of raid protection with > caching. btw. replication in the FS makes this even worse as now each write > turns into 3 iops for the data + additional iops for the log records so you > eat up your iops very quick . > > 3. instead of shared SAN a shared SAS device is significantly cheaper but > only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 > nodes as you can use the disks as tiebreaker disks. if you also add some > SSD's for the metadata and make use of HAWC and LROC you might get away > from not needing a raid controller with cache as HAWC will solve that issue > for you . > > just a few thoughts :-D > > sven > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > [image: Inactive hide details for Zachary Giles ---03/04/2016 05:36:50 > PM---SMB too, eh? See this is where it starts to get hard to sca]Zachary > Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts > to get hard to scale down. You could do a 3 node GPFS clust > > From: Zachary Giles > > > To: gpfsug main discussion list > > Date: 03/04/2016 05:36 PM > > > Subject: Re: [gpfsug-discuss] Small cluster > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > SMB too, eh? See this is where it starts to get hard to scale down. You > could do a 3 node GPFS cluster with replication at remote sites, pulling in > from AFM over the Net. If you want SMB too, you're probably going to need > another pair of servers to act as the Protocol Servers on top of the 3 GPFS > servers. I think running them all together is not recommended, and probably > I'd agree with that. > Though, you could do it anyway. If it's for read-only and updated daily, > eh, who cares. Again, depends on your GPFS experience and the balance > between production, price, and performance :) > > On Fri, Mar 4, 2016 at 11:30 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > > Yes. Really the only other option we have (and not a bad one) is > getting a v7000 Unified in there (if we can get the price down far > enough). That?s not a bad option since all they really want is SMB shares > in the remote. I just keep thinking a set of servers would do the trick > and be cheaper. > > > > *From: *Zachary Giles <*zgiles at gmail.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:26 AM > > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > You can do FPO for non-Hadoop workloads. It just alters the disks > below the GPFS filesystem layer and looks like a normal GPFS system > (mostly). I do think there were some restrictions on non-FPO nodes > mounting FPO filesystems via multi-cluster.. not sure if those are still > there.. any input on that from IBM? > > If small enough data, and with 3-way replication, it might just be > wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just > common throwing out numbers), 3 of those per site would fit in your budget. > > Again.. depending on your requirements, stability balance between > 'science experiment' vs production, GPFS knowledge level, etc etc... > > This is actually an interesting and somewhat missing space for small > enterprises. If you just want 10-20TB active-active online everywhere, say, > for VMware, or NFS, or something else, there arent all that many good > solutions today that scale down far enough and are a decent price. It's > easy with many many PB, but small.. idk. I think the above sounds good as > anything without going SAN-crazy. > > > > On Fri, Mar 4, 2016 at 11:21 AM, *Mark.Bush at siriuscom.com* > <*Mark.Bush at siriuscom.com* > > wrote: > I guess this is really my question. Budget is less than $50k per site > and they need around 20TB storage. Two nodes with MD3 or something may > work. But could it work (and be successful) with just servers and internal > drives? Should I do FPO for non hadoop like workloads? I didn?t think I > could get native raid except in the ESS (GSS no longer exists if I remember > correctly). Do I just make replicas and call it good? > > > Mark > > *From: *<*gpfsug-discuss-bounces at spectrumscale.org* > > on behalf of Marc A Kaplan > <*makaplan at us.ibm.com* > > * Reply-To: *gpfsug main discussion list < > *gpfsug-discuss at spectrumscale.org* > > * Date: *Friday, March 4, 2016 at 10:09 AM > * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > * Subject: *Re: [gpfsug-discuss] Small cluster > > Jon, I don't doubt your experience, but it's not quite fair or even > sensible to make a decision today based on what was available in the GPFS > 2.3 era. > > We are now at GPFS 4.2 with support for 3 way replication and FPO. > Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS > solutions and more. > > So more choices, more options, making finding an "optimal" solution > more difficult. > > To begin with, as with any provisioning problem, one should try to > state: requirements, goals, budgets, constraints, failure/tolerance > models/assumptions, > expected workloads, desired performance, etc, etc. > > This message (including any attachments) is intended only for the use > of the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > > *Sirius Computer Solutions* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > -- > Zach Giles > *zgiles at gmail.com* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at us.ibm.com Sat Mar 5 13:31:40 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sat, 5 Mar 2016 14:31:40 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> as i stated in my previous post , its a recommendation so people don't overload the NSD servers to have them become non responsive or even forced rebooted (e.g. when you configure cNFS auto reboot on same node), it doesn't mean it doesn't work or is not supported. if all you are using this cluster for is NAS services, then this recommendation makes even less sense as the whole purpose on why the recommendation is there to begin with is that if NFS would overload a node that also serves as NSD server for other nodes it would impact the other nodes that use the NSD protocol, but if there are no NSD clients there is nothing to protect because if NFS is down all clients are not able to access data, even if your NSD servers are perfectly healthy... if you have a fairly large system with many NSD Servers, many clients as well as NAS clients this recommendation is correct, but not in the scenario you described below. i will work with the team to come up with a better wording for this in the FAQ. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jan-Frode Myklebust To: gpfsug main discussion list Cc: Sven Oehme/Almaden/IBM at IBMUS Date: 03/05/2016 02:17 PM Subject: Re: [gpfsug-discuss] Small cluster Regarding #1, the FAQ has recommendation to not run CES nodes directly attached to storage: """ ? NSD server functionality and storage attached to Protocol node. We recommend that Protocol nodes do not take on these functions """ For small CES clusters we're now configuring 2x P822L with one partition on each server owning FC adapters and acting as NSD server/quorum/manager and the other partition being CES node accessing disk via IP. I would much rather have a plain SAN model cluster were all nodes accessed disk directly (probably still with a dedicated quorum/manager partition), but this FAQ entry is preventing that.. -jf fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme : Hi, a couple of comments to the various infos in this thread. 1. the need to run CES on separate nodes is a recommendation, not a requirement and the recommendation comes from the fact that if you have heavy loaded NAS traffic that gets the system to its knees, you can take your NSD service down with you if its on the same box. so as long as you have a reasonable performance expectation and size the system correct there is no issue. 2. shared vs FPO vs shared nothing (just replication) . the main issue people overlook in this scenario is the absence of read/write caches in FPO or shared nothing configurations. every physical disk drive can only do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its pretty much the same effort. particular on metadata this bites you really badly as every of this tiny i/os eats one of your 100 iops a disk can do and quickly you used up all your iops on the drives. if you have any form of raid controller (sw or hw) it typically implements at minimum a read cache on most systems a read/write cache which will significant increase the number of logical i/os one can do against a disk , my best example is always if you have a workload that does 4k seq DIO writes to a single disk, if you have no raid controller you can do 400k/sec in this workload if you have a reasonable ok write cache in front of the cache you can do 50 times that much. so especilly if you use snapshots, CES services or anything thats metadata intensive you want some type of raid protection with caching. btw. replication in the FS makes this even worse as now each write turns into 3 iops for the data + additional iops for the log records so you eat up your iops very quick . 3. instead of shared SAN a shared SAS device is significantly cheaper but only scales to 2-4 nodes , the benefit is you only need 2 instead of 3 nodes as you can use the disks as tiebreaker disks. if you also add some SSD's for the metadata and make use of HAWC and LROC you might get away from not needing a raid controller with cache as HAWC will solve that issue for you . just a few thoughts :-D sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Zachary Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS clust From: Zachary Giles To: gpfsug main discussion list Date: 03/04/2016 05:36 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org SMB too, eh? See this is where it starts to get hard to scale down. You could do a 3 node GPFS cluster with replication at remote sites, pulling in from AFM over the Net. If you want SMB too, you're probably going to need another pair of servers to act as the Protocol Servers on top of the 3 GPFS servers. I think running them all together is not recommended, and probably I'd agree with that. Though, you could do it anyway. If it's for read-only and updated daily, eh, who cares. Again, depends on your GPFS experience and the balance between production, price, and performance :) On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: Yes.? Really the only other option we have (and not a bad one) is getting a v7000 Unified in there (if we can get the price down far enough).? That?s not a bad option since all they really want is SMB shares in the remote.? I just keep thinking a set of servers would do the trick and be cheaper. From: Zachary Giles Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:26 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster You can do FPO for non-Hadoop workloads. It just alters the disks below the GPFS filesystem layer and looks like a normal GPFS system (mostly).? I do think there were some restrictions on non-FPO nodes mounting FPO filesystems via multi-cluster.. not sure if those are still there.. any input on that from IBM? If small enough data, and with 3-way replication, it might just be wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just common throwing out numbers), 3 of those per site would fit in your budget. Again.. depending on your requirements, stability balance between 'science experiment' vs production, GPFS knowledge level, etc etc... This is actually an interesting and somewhat missing space for small enterprises. If you just want 10-20TB active-active online everywhere, say, for VMware, or NFS, or something else, there arent all that many good solutions today that scale down far enough and are a decent price. It's easy with many many PB, but small.. idk. I think the above sounds good as anything without going SAN-crazy. On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com < Mark.Bush at siriuscom.com> wrote: I guess this is really my question.? Budget is less than $50k per site and they need around 20TB storage.? Two nodes with MD3 or something may work.? But could it work (and be successful) with just servers and internal drives?? Should I do FPO for non hadoop like workloads?? I didn?t think I could get native raid except in the ESS (GSS no longer exists if I remember correctly).? Do I just make replicas and call it good? Mark From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: Friday, March 4, 2016 at 10:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Small cluster Jon, I don't doubt your experience, but it's not quite fair or even sensible to make a decision today based on what was available in the GPFS 2.3 era. We are now at GPFS 4.2 with support for 3 way replication and FPO. Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS solutions and more. So more choices, more options, making finding an "optimal" solution more difficult. To begin with, as with any provisioning problem, one should try to state: requirements, goals, budgets, constraints, failure/tolerance models/assumptions, expected workloads, desired performance, etc, etc. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "graycol.gif" deleted by Sven Oehme/Almaden/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sat Mar 5 18:40:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 5 Mar 2016 13:40:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> Message-ID: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Sat Mar 5 18:52:16 2016 From: zgiles at gmail.com (Zachary Giles) Date: Sat, 5 Mar 2016 13:52:16 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603051331.u25DVjvV017738@d01av01.pok.ibm.com> <201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: > Indeed it seems to just add overhead and expense to split what can be done > by one node over two nodes! > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- Zach Giles zgiles at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Sun Mar 6 13:55:59 2016 From: oehmes at us.ibm.com (Sven Oehme) Date: Sun, 6 Mar 2016 14:55:59 +0100 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603051331.u25DVjvV017738@d01av01.pok.ibm.com><201603051840.u25Iet6K017732@d01av03.pok.ibm.com> Message-ID: <201603061356.u26Du4Zj014555@d03av05.boulder.ibm.com> the question is what difference does it make ? as i mentioned if all your 2 or 3 nodes do is serving NFS it doesn't matter if the protocol nodes or the NSD services are down in both cases it means no access to data which it makes no sense to separate them in this case (unless load dependent). i haven't seen nodes reboot specifically because of protocol issues lately, the fact that everything is in userspace makes things easier too. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 03/06/2016 02:31 AM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, What about the stability of the new protocol nodes vs the old cNFS? If you remember, back in the day, cNFS would sometimes have a problem and reboot the whole server itself. Obviously this was problematic if it's one of the few servers running your cluster. I assume this is different now with the Protocol Servers? On Sat, Mar 5, 2016 at 1:40 PM, Marc A Kaplan wrote: Indeed it seems to just add overhead and expense to split what can be done by one node over two nodes! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Sun Mar 6 20:27:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 6 Mar 2016 15:27:50 -0500 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> Message-ID: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From janfrode at tanso.net Mon Mar 7 06:11:27 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 07 Mar 2016 06:11:27 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you really > need the protection/isolation. But I expect you are limiting the potential > performance of the overall system, compared to running a single Unix image > with multiple processes that can share resource and communicate more freely. > > > [image: Marc A Kaplan] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From volobuev at us.ibm.com Mon Mar 7 20:58:37 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Mon, 7 Mar 2016 12:58:37 -0800 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk><201603041609.u24G98Yw022449@d03av02.boulder.ibm.com><789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com><4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com><201603041804.u24I4g2R026689@d03av01.boulder.ibm.com><201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> Message-ID: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri From: Jan-Frode Myklebust To: gpfsug main discussion list , Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan : As Sven wrote, the FAQ does not "prevent" anything.? It's just a recommendation someone came up with.? Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation.? But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Mon Mar 7 21:10:48 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Mon, 7 Mar 2016 21:10:48 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this conversation. Thank you all for your responses. This is a great forum filled with very knowledgeable folks. Mark From: > on behalf of Yuri L Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster This use case is a good example of how it's hard to optimize across multiple criteria. If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual. If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have. Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails. As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources. FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba. I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine. yuri [Inactive hide details for Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within whatever is recommended. From: Jan-Frode Myklebust > To: gpfsug main discussion list >, Date: 03/06/2016 10:12 PM Subject: Re: [gpfsug-discuss] Small cluster Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I agree, but would also normally want to stay within whatever is recommended. What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines? -jf s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan >: As Sven wrote, the FAQ does not "prevent" anything. It's just a recommendation someone came up with. Which may or may not apply to your situation. Partitioning a server into two servers might be a good idea if you really need the protection/isolation. But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@][cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@]_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: 0B132319.gif URL: From r.sobey at imperial.ac.uk Tue Mar 8 09:48:01 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 09:48:01 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Message-ID: Morning all, I tried to download the VM to evaluate SS yesterday - more of a chance to play around with commands in a non-prod environment and look at what's in store. We're currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who's already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Mar 8 13:09:21 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 8 Mar 2016 13:09:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: References: Message-ID: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel Dr.Daniel Kidger No. 1 The Square, Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Mar 8 13:16:37 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Mar 2016 13:16:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query In-Reply-To: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> References: <201603081309.u28D9TAZ026081@d06av03.portsmouth.uk.ibm.com> Message-ID: Hah, well now you?ve got me checking just to make sure ? Ok, definitely says United Kingdom. Now it won?t let me download it at all, says page not found. Will persevere! Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 08 March 2016 13:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Eval VM download query Richard, Sounds unusual. When you registered your IBM ID for login - did you choose your country from the drop-down list as North Korea ? ;-) Daniel ________________________________ Dr.Daniel Kidger No. 1 The Square, [cid:image001.gif at 01D1793C.BE2DB440] Technical Specialist SDI (formerly Platform Computing) Temple Quay, Bristol BS1 6DG Mobile: +44-07818 522 266 United Kingdom Landline: +44-02392 564 121 (Internal ITN 3726 9250) e-mail: daniel.kidger at uk.ibm.com ________________________________ From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 08/03/2016 09:48 Subject: [gpfsug-discuss] Spectrum Scale Eval VM download query Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Morning all, I tried to download the VM to evaluate SS yesterday ? more of a chance to play around with commands in a non-prod environment and look at what?s in store. We?re currently running 3.5 and upgrading in the new few months. Anyway, I registered for the download, and then got greeted with a message as follows: This product is subject to strict US export control laws. Prior to providing access, we must validate whether you are eligible to receive it under an available US export authorization. Your request is being reviewed. Upon completion of this review, you will be contacted if we are able to give access. We apologize for any inconvenience. So, how long does this normally take? Who?s already done it? Thanks Richard Richard Sobey Technical Operations, ICT Imperial College London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 360 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Tue Mar 8 15:53:34 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 8 Mar 2016 15:53:34 +0000 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Message-ID: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 16:36:46 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:36:46 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <20160308113646.54314ikzhtedrjby@support.scinet.utoronto.ca> I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Tue Mar 8 16:54:45 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 11:54:45 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <20160308115445.10061uekt4pp5kgl@support.scinet.utoronto.ca> For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Tue Mar 8 17:45:42 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:45:42 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts Message-ID: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Hi, please have a look at this document: http://www-01.ibm.com/support/docview.wss?uid=swg27018848 It describe the how-to setup and provides some hints and tips for migration policies. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 For the new Spectrum Suite of products, are there specific references with examples on how to set up gpfs policy rules to integrate TSM so substantially improve the migration performance of HSM? The reason I ask is because I've been reading manuals with 200+ pages where it's very clear this is possible to be accomplished, by builtin lists and feeding those to TSM, however some of the examples and rules are presented out of context, and not integrated onto a single self-contained document. The GPFS past has it own set of manuals, but so do TSM and HSM. For those of you already doing it, what has been your experience, what are the tricks (where can I read about them), how the addition of multiple nodes to the working pool is performing? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Tue Mar 8 17:46:11 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Tue, 8 Mar 2016 18:46:11 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Message-ID: <201603081646.u28GkIXt026930@d06av10.portsmouth.uk.ibm.com> Hi, in all cases a recall request will be handled transparent for the user at the time a migrated files is accessed. This can't be prevented and has two down sides: a) the space used in the file system increases and b) random access to storage media in the Spectrum Protect server happens. With newer versions of Spectrum Protect for Space Management a so called tape optimized recall method is available that can reduce the impact to the system (especially Spectrum Protect server). If the problem was that the file system went out of space at the time the recalls came in I would recommend to reduce the threshold settings for the file system and increase the number of premigrated files. This will allow to free space very quickly if needed. If you didn't use the policy based threshold migration so far I recommend to use it. This method is significant faster compared to the classical HSM based threshold migration approach. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 ----- From: Jaime Pinto To: gpfsug main discussion list Date: 08.03.2016 17:36 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Sent by: gpfsug-discuss-bounces at spectrumscale.org I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Tue Mar 8 18:58:29 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 18:58:29 +0000 Subject: [gpfsug-discuss] Small cluster In-Reply-To: References: <56D8B94C.2000303@buzzard.me.uk> <201603041609.u24G98Yw022449@d03av02.boulder.ibm.com> <789D698B-FC75-48E2-B510-2D647173B150@siriuscom.com> <4427540E-2BEC-4B3A-A664-1837C9EACCDF@siriuscom.com> <201603041804.u24I4g2R026689@d03av01.boulder.ibm.com> <201603062027.u26KRwkC026320@d03av04.boulder.ibm.com> <201603072058.u27Kwiql018712@d03av05.boulder.ibm.com> Message-ID: My fantasy solution is 2 servers and a SAS disk shelf from my adopted, cheap x86 vendor running IBM Spectrum Scale with GNR as software only, doing concurrent, supported GNR and CES with maybe an advisory on the performance requirements of such and suggestions on scale out approaches :) Cheers Chris On 7 March 2016 at 21:10, Mark.Bush at siriuscom.com wrote: > Thanks Yuri, this solidifies some of the conclusions I?ve drawn from this > conversation. Thank you all for your responses. This is a great forum > filled with very knowledgeable folks. > > Mark > > From: on behalf of Yuri L > Volobuev > Reply-To: gpfsug main discussion list > Date: Monday, March 7, 2016 at 2:58 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Small cluster > > This use case is a good example of how it's hard to optimize across > multiple criteria. > > If you want a pre-packaged solution that's proven and easy to manage, > StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for > your requirements as such things get. Price may be an issue though, as > usual. > > If you're OK with rolling your own complex solution, my recommendation > would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external > disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via > the local block device interface. This avoids the pitfalls of data/metadata > replication, and offers a decent blend of performance, fault tolerance, and > disk management. You can use disk-based quorum if going with 2 nodes, or > traditional node majority quorum if using 3 nodes, either way would work. > There's no need to do any separation of roles (CES, quorum, managers, etc), > provided the nodes are adequately provisioned with memory and aren't > routinely overloaded, in which case you just need to add more nodes instead > of partitioning what you have. > > Using internal disks and relying on GPFS data/metadata replication, with > or without FPO, would mean taking the hard road. You may be able to spend > the least on hardware in such a config (although the 33% disk utilization > rate for triplication makes this less clear, if capacity is an issue), but > the operational challenges are going to be substantial. This would be a > viable config, but there are unavoidable tradeoffs caused by replication: > (1) writes are very expensive, which limits the overall cluster capability > for non-read-only workloads, (2) node and disk failures require a round of > re-replication, or "re-protection", which takes time and bandwidth, > limiting the overall capability further, (3) disk management can be a > challenge, as there's no software/hardware component to assist with > identifying failing/failed disks. As far as not going off the beaten path, > this is not it... Exporting protocols from a small triplicated file system > is not a typical mode of deployment of Spectrum Scale, you'd be blazing > some new trails. > > As stated already in several responses, there's no hard requirement that > CES Protocol nodes must be entirely separate from any other roles in the > general Spectrum Scale deployment scenario. IBM expressly disallows > co-locating Protocol nodes with ESS servers, due to resource consumption > complications, but for non-ESS cases it's merely a recommendation to run > Protocols on nodes that are not otherwise encumbered by having to provide > other services. Of course, the config that's the best for performance is > not the cheapest. CES doesn't reboot nodes to recover from NFS problems, > unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a > complex software stack is a complex software stack, so there's greater > potential for things to go sideways, in particular due to the lack of > resources. > > FPO vs plain replication: this only matters if you have apps that are > capable of exploiting data locality. FPO changes the way GPFS stripes data > across disks. Without FPO, GPFS does traditional wide striping of blocks > across all disks in a given storage pool. When FPO is in use, data in large > files is divided in large (e.g. 1G) chunks, and there's a node that holds > an entire chunk on its internal disks. An application that knows how to > query data block layout of a given file can then schedule the job that > needs to read from this chunk on the node that holds a local copy. This > makes a lot of sense for integrated data analytics workloads, a la Map > Reduce with Hadoop, but doesn't make sense for generic apps like Samba. > > I'm not sure what language in the FAQ creates the impression that the SAN > deployment model is somehow incompatible with running Procotol services. > This is perfectly fine. > > yuri > > [image: Inactive hide details for Jan-Frode Myklebust ---03/06/2016 > 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode > Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want > to stay within whatever is recommended. > > From: Jan-Frode Myklebust > To: gpfsug main discussion list , > Date: 03/06/2016 10:12 PM > Subject: Re: [gpfsug-discuss] Small cluster > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I agree, but would also normally want to stay within whatever is > recommended. > > What about quorum/manager functions? Also OK to run these on the CES nodes > in a 2-node cluster, or any reason to partition these out so that we then > have a 4-node cluster running on 2 physical machines? > > > -jf > s?n. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan <*makaplan at us.ibm.com* > >: > > As Sven wrote, the FAQ does not "prevent" anything. It's just a > recommendation someone came up with. Which may or may not apply to your > situation. > > Partitioning a server into two servers might be a good idea if you > really need the protection/isolation. But I expect you are limiting the > potential performance of the overall system, compared to running a single > Unix image with multiple processes that can share resource and communicate > more freely. > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > This message (including any attachments) is intended only for the use of > the individual or entity to which it is addressed and may contain > information that is non-public, proprietary, privileged, confidential, and > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this communication is strictly prohibited. This > message may be viewed by parties at Sirius Computer Solutions other than > those named in the message header. This message does not contain an > official representation of Sirius Computer Solutions. If you have received > this communication in error, notify Sirius Computer Solutions immediately > and (i) destroy this message if a facsimile or (ii) delete this message > immediately if this is an electronic communication. Thank you. > Sirius Computer Solutions > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0B132319.gif Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From chrisjscott at gmail.com Tue Mar 8 19:10:25 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 8 Mar 2016 19:10:25 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts In-Reply-To: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: To add a customer data point, I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris On 8 March 2016 at 17:45, Dominic Mueller-Wicke01 < dominic.mueller at de.ibm.com> wrote: > Hi, > > please have a look at this document: > http://www-01.ibm.com/support/docview.wss?uid=swg27018848 > It describe the how-to setup and provides some hints and tips for > migration policies. > > Greetings, Dominic. > > > ______________________________________________________________________________________________________________ > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > > > For the new Spectrum Suite of products, are there specific references > with examples on how to set up gpfs policy rules to integrate TSM so > substantially improve the migration performance of HSM? > > The reason I ask is because I've been reading manuals with 200+ pages > where it's very clear this is possible to be accomplished, by builtin > lists and feeding those to TSM, however some of the examples and rules > are presented out of context, and not integrated onto a single > self-contained document. The GPFS past has it own set of manuals, but > so do TSM and HSM. > > For those of you already doing it, what has been your experience, what > are the tricks (where can I read about them), how the addition of > multiple nodes to the working pool is performing? > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:37:22 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:37:22 -0500 Subject: [gpfsug-discuss] Interpreting "mmlsqos" output In-Reply-To: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> References: <88E34108-309D-4B10-AF88-0FAE6626B191@nuance.com> Message-ID: <201603081937.u28JbUoj017559@d01av03.pok.ibm.com> Bob, You can read ioql as "IO queue length" (outside of GPFS) and "qsdl" as QOS queue length at the QOS throttle within GPFS, computed from average delay introduced by the QOS subsystem. These "queue lengths" are virtual or fictional -- They are computed by observing average service times and applying Little's Law. That is there is no single actual queue but each IO request spends some time in the OS + network + disk controller + .... For IO bound workloads one can verify that ioql+qsdl is the average number of application threads waiting for IO. Our documentation puts it this way (See 4.2 Admin Guide, mmlsqos command) iops= The performance of the class in I/O operations per second. ioql= The average number of I/O requests in the class that are pending for reasons other than being queued by QoS. This number includes, for example, I/O requests that are waiting for network or storage device servicing. qsdl= The average number of I/O requests in the class that are queued by QoS. When the QoS system receives an I/O request from the file system, QoS first finds the class to which the I/O request belongs. It then finds whether the class has any I/O operations available for consumption. If not, then QoS queues the request until more I/O operations become available for the class. The Qsdl value is the average number of I/O requests that are held in this queue. et= The interval in seconds during which the measurement was made. You can calculate the average service time for an I/O operation as (Ioql + Qsdl)/Iops. For a system that is running IO-intensive applications, you can interpret the value (Ioql + Qsdl) as the number of threads in the I/O-intensive applications. This interpretation assumes that each thread spends most of its time in waiting for an I/O operation to complete. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 03/08/2016 10:53 AM Subject: [gpfsug-discuss] Interpreting "mmlsqos" output Sent by: gpfsug-discuss-bounces at spectrumscale.org So ? I enabled QoS on my file systems using the defaults in 4.2 Running a restripe with a class of ?maintenance? gives me this for mmlsqos output: [root at gpfs-vmd01a ~]# mmlsqos VMdata01 --sum-nodes yes QOS config:: enabled QOS values:: pool=system,other=inf,maintenance=inf QOS status:: throttling active, monitoring active === for pool system 10:36:30 other iops=9754 ioql=12.17 qsdl=0.00022791 et=5 10:36:30 maint iops=55 ioql=0.067331 qsdl=2.7e-05 et=5 10:36:35 other iops=7999.8 ioql=12.613 qsdl=0.00013951 et=5 10:36:35 maint iops=52 ioql=0.10034 qsdl=2.48e-05 et=5 10:36:40 other iops=8890.8 ioql=12.117 qsdl=0.00016095 et=5 10:36:40 maint iops=71.2 ioql=0.13904 qsdl=3.56e-05 et=5 10:36:45 other iops=8303.8 ioql=11.17 qsdl=0.00011438 et=5 10:36:45 maint iops=52.8 ioql=0.08261 qsdl=3.06e-05 et=5 It looks like the ?maintenance? class is getting perhaps 5% of the overall IOP rate? What do ?ioql? and ?qsdl? indicate? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Mar 8 19:45:13 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 8 Mar 2016 14:45:13 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: migration via GPFS policy scripts - Success story! In-Reply-To: References: <201603081745.u28HjsuI010585@d06av12.portsmouth.uk.ibm.com> Message-ID: <201603081945.u28JjKrL008155@d01av01.pok.ibm.com> "I followed that guide using GPFS 3.4 and TSM 6.4 with HSM and it's been working perfectly since then. I was even able to remove dsmscoutd online, node-at-a-time back when I made the transition. The performance change was revolutionary and so is the file selection. We have large filesystems with millions of files, changing often, that TSM incremental scan wouldn't cope with and Spectrum Scale 4.1.1 and Spectrum Protect 7.1.3 using mmbackup as described in the SS 4.1.1 manual, creating a snapshot for mmbackup also works perfectly for backup. Cheers Chris THANKS, SCOTT -- we love to hear/see customer comments and feedback, especially when they are positive ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 8 20:38:52 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 08 Mar 2016 15:38:52 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> Message-ID: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Wed Mar 9 09:35:56 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Wed, 9 Mar 2016 10:35:56 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> Message-ID: <201603090836.u298a1D1017873@d06av10.portsmouth.uk.ibm.com> Hi Jamie, I see. So, the recall-shutdown would be something for a short time period. right? Just for the time it takes to migrate files out and free space. If HSM would allow the recall-shutdown the impact for the users would be that each access to migrated files would lead to an access denied error. Would that be acceptable for the users? Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug-discuss at spectrumscale.org Date: 08.03.2016 21:38 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Thanks for the suggestions Dominic I remember playing around with premigrated files at the time, and that was not satisfactory. What we are looking for is a configuration based parameter what will basically break out of the "transparency for the user" mode, and not perform any further recalling, period, if|when the file system occupancy is above a certain threshold (98%). We would not mind if instead gpfs would issue a preemptive "disk full" error message to any user/app/job relying on those files to be recalled, so migration on demand will have a chance to be performance. What we prefer is to swap precedence, ie, any migration requests would be executed ahead of any recalls, at least until a certain amount of free space on the file system has been cleared. It's really important that this type of feature is present, for us to reconsider the TSM version of HSM as a solution. It's not clear from the manual that this can be accomplish in some fashion. Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > > Hi, > > in all cases a recall request will be handled transparent for the user at > the time a migrated files is accessed. This can't be prevented and has two > down sides: a) the space used in the file system increases and b) random > access to storage media in the Spectrum Protect server happens. With newer > versions of Spectrum Protect for Space Management a so called tape > optimized recall method is available that can reduce the impact to the > system (especially Spectrum Protect server). > If the problem was that the file system went out of space at the time the > recalls came in I would recommend to reduce the threshold settings for the > file system and increase the number of premigrated files. This will allow > to free space very quickly if needed. If you didn't use the policy based > threshold migration so far I recommend to use it. This method is > significant faster compared to the classical HSM based threshold migration > approach. > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 18:21 > ----- > > From: Jaime Pinto > To: gpfsug main discussion list > Date: 08.03.2016 17:36 > Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I'm wondering whether the new version of the "Spectrum Suite" will > allow us set the priority of the HSM migration to be higher than > staging. > > > I ask this because back in 2011 when we were still using Tivoli HSM > with GPFS, during mixed requests for migration and staging operations, > we had a very annoying behavior in which the staging would always take > precedence over migration. The end-result was that the GPFS would fill > up to 100% and induce a deadlock on the cluster, unless we identified > all the user driven stage requests in time, and killed them all. We > contacted IBM support a few times asking for a way fix this, and were > told it was built into TSM. Back then we gave up IBM's HSM primarily > for this reason, although performance was also a consideration (more > to this on another post). > > We are now reconsidering HSM for a new deployment, however only if > this issue has been resolved (among a few others). > > What has been some of the experience out there? > > Thanks > Jaime > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 12:12:08 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 07:12:08 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> Message-ID: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Yes! A behavior along those lines would be desirable. Users understand very well what it means for a file system to be near full. Are there any customers already doing something similar? Thanks Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jamie, > > I see. So, the recall-shutdown would be something for a short time period. > right? Just for the time it takes to migrate files out and free space. If > HSM would allow the recall-shutdown, the impact for the users would be that > each access to migrated files would lead to an access denied error. Would > that be acceptable for the users? > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Cc: gpfsug-discuss at spectrumscale.org > Date: 08.03.2016 21:38 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Thanks for the suggestions Dominic > > I remember playing around with premigrated files at the time, and that > was not satisfactory. > > What we are looking for is a configuration based parameter what will > basically break out of the "transparency for the user" mode, and not > perform any further recalling, period, if|when the file system > occupancy is above a certain threshold (98%). We would not mind if > instead gpfs would issue a preemptive "disk full" error message to any > user/app/job relying on those files to be recalled, so migration on > demand will have a chance to be performance. What we prefer is to swap > precedence, ie, any migration requests would be executed ahead of any > recalls, at least until a certain amount of free space on the file > system has been cleared. > > It's really important that this type of feature is present, for us to > reconsider the TSM version of HSM as a solution. It's not clear from > the manual that this can be accomplish in some fashion. > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> >> Hi, >> >> in all cases a recall request will be handled transparent for the user at >> the time a migrated files is accessed. This can't be prevented and has > two >> down sides: a) the space used in the file system increases and b) random >> access to storage media in the Spectrum Protect server happens. With > newer >> versions of Spectrum Protect for Space Management a so called tape >> optimized recall method is available that can reduce the impact to the >> system (especially Spectrum Protect server). >> If the problem was that the file system went out of space at the time the >> recalls came in I would recommend to reduce the threshold settings for > the >> file system and increase the number of premigrated files. This will allow >> to free space very quickly if needed. If you didn't use the policy based >> threshold migration so far I recommend to use it. This method is >> significant faster compared to the classical HSM based threshold > migration >> approach. >> >> Greetings, Dominic. >> >> > ______________________________________________________________________________________________________________ > >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead > | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 > 18:21 >> ----- >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 08.03.2016 17:36 >> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I'm wondering whether the new version of the "Spectrum Suite" will >> allow us set the priority of the HSM migration to be higher than >> staging. >> >> >> I ask this because back in 2011 when we were still using Tivoli HSM >> with GPFS, during mixed requests for migration and staging operations, >> we had a very annoying behavior in which the staging would always take >> precedence over migration. The end-result was that the GPFS would fill >> up to 100% and induce a deadlock on the cluster, unless we identified >> all the user driven stage requests in time, and killed them all. We >> contacted IBM support a few times asking for a way fix this, and were >> told it was built into TSM. Back then we gave up IBM's HSM primarily >> for this reason, although performance was also a consideration (more >> to this on another post). >> >> We are now reconsidering HSM for a new deployment, however only if >> this issue has been resolved (among a few others). >> >> What has been some of the experience out there? >> >> Thanks >> Jaime >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From chrisjscott at gmail.com Wed Mar 9 14:44:39 2016 From: chrisjscott at gmail.com (Chris Scott) Date: Wed, 9 Mar 2016 14:44:39 +0000 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: Not meaning to hjack the thread but while we're on the topic of transparent recall: I'd like to be able to disable it such that I can use SS ILM policies agreed with the data owners to "archive" their data and recover disk space by migrating files to tape, marking them as immutable to defend against accidental or malicious deletion and have some user interface that would let them "retrieve" the data back to disk as writable again, subject to sufficient free disk space and within any quota limits as applicable. Cheers Chris On 9 March 2016 at 12:12, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > > >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> >> ______________________________________________________________________________________________________________ >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >> | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the user at >>> the time a migrated files is accessed. This can't be prevented and has >>> >> two >> >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >>> >> newer >> >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time the >>> recalls came in I would recommend to reduce the threshold settings for >>> >> the >> >>> file system and increase the number of premigrated files. This will allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >>> >> migration >> >>> approach. >>> >>> Greetings, Dominic. >>> >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead >>> >> | >> >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> >> 18:21 >> >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list < >>> gpfsug-discuss at spectrumscale.org> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. >>> migration >>> >> priority >> >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 9 15:05:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 9 Mar 2016 10:05:31 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> For a write or create operation ENOSPC would make some sense. But if the file already exists and I'm just opening for read access I would be very confused by ENOSPC. How should the system respond: "Sorry, I know about that file, I have it safely stored away in HSM, but it is not available right now. Try again later!" EAGAIN or EBUSY might be the closest in ordinary language... But EAGAIN is used when a system call is interrupted and can be retried right away... So EBUSY? The standard return codes in Linux are: #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* I/O error */ #define ENXIO 6 /* No such device or address */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file number */ #define ECHILD 10 /* No child processes */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memory */ #define EACCES 13 /* Permission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Block device required */ #define EBUSY 16 /* Device or resource busy */ #define EEXIST 17 /* File exists */ #define EXDEV 18 /* Cross-device link */ #define ENODEV 19 /* No such device */ #define ENOTDIR 20 /* Not a directory */ #define EISDIR 21 /* Is a directory */ #define EINVAL 22 /* Invalid argument */ #define ENFILE 23 /* File table overflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a typewriter */ #define ETXTBSY 26 /* Text file busy */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-only file system */ #define EMLINK 31 /* Too many links */ #define EPIPE 32 /* Broken pipe */ #define EDOM 33 /* Math argument out of domain of func */ #define ERANGE 34 /* Math result not representable */ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 15:21:53 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 10:21:53 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> Message-ID: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Wed Mar 9 19:56:13 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 14:56:13 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) Message-ID: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Here is another area where I've been reading material from several sources for years, and in fact trying one solution over the other from time-to-time in a test environment. However, to date I have not been able to find a one-piece-document where all these different IBM alternatives for backup are discussed at length, with the pos and cons well explained, along with the how-to's. I'm currently using TSM(built-in backup client), and over the years I developed a set of tricks to rely on disk based volumes as intermediate cache, and multiple backup client nodes, to split the load and substantially improve the performance of the backup compared to when I first deployed this solution. However I suspect it could still be improved further if I was to apply tools from the GPFS side of the equation. I would appreciate any comments/pointers. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From YARD at il.ibm.com Wed Mar 9 20:16:59 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 9 Mar 2016 22:16:59 +0200 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> Message-ID: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Hi Did u use mmbackup with TSM ? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm Please also review this : http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > From: Jaime Pinto > To: gpfsug main discussion list > Date: 03/09/2016 09:56 PM > Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup > scripts) vs. TSM(backup) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Here is another area where I've been reading material from several > sources for years, and in fact trying one solution over the other from > time-to-time in a test environment. However, to date I have not been > able to find a one-piece-document where all these different IBM > alternatives for backup are discussed at length, with the pos and cons > well explained, along with the how-to's. > > I'm currently using TSM(built-in backup client), and over the years I > developed a set of tricks to rely on disk based volumes as > intermediate cache, and multiple backup client nodes, to split the > load and substantially improve the performance of the backup compared > to when I first deployed this solution. However I suspect it could > still be improved further if I was to apply tools from the GPFS side > of the equation. > > I would appreciate any comments/pointers. > > Thanks > Jaime > > > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed Mar 9 21:33:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 09 Mar 2016 16:33:49 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> Message-ID: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Quoting Yaron Daniel : > Hi > > Did u use mmbackup with TSM ? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm I have used mmbackup on test mode a few times before, while under gpfs 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our facility yet). Under both 3.2 and 3.3 mmbackup would always lock up our cluster when using snapshot. I never understood the behavior without snapshot, and the lock up was intermittent in the carved-out small test cluster, so I never felt confident enough to deploy over the larger 4000+ clients cluster. Another issue was that the version of mmbackup then would not let me choose the client environment associated with a particular gpfs file system, fileset or path, and the equivalent storage pool and /or policy on the TSM side. With the native TSM client we can do this by configuring the dsmenv file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so we can keep the backups segregated on different pools/tapes if necessary (by user, by group, by project, etc) The problem we all agree on is that TSM client traversing is VERY SLOW, and can not be parallelized. I always knew that the mmbackup client was supposed to replace the TSM client for the traversing, and then parse the "necessary parameters" and files to the native TSM client, so it could then take over for the remainder of the workflow. Therefore, the remaining problems are as follows: * I never understood the snapshot induced lookup, and how to fix it. Was it due to the size of our cluster or the version of GPFS? Has it been addressed under 3.5 or 4.x series? Without the snapshot how would mmbackup know what was already gone to backup since the previous incremental backup? Does it check each file against what is already on TSM to build the list of candidates? What is the experience out there? * In the v4r2 version of the manual for the mmbackup utility we still don't seem to be able to determine which TSM BA Client dsmenv to use as a parameter. All we can do is choose the --tsm-servers TSMServer[,TSMServer...]] . I can only conclude that all the contents of any backup on the GPFS side will always end-up on a default storage pool and use the standard TSM policy if nothing else is done. I'm now wondering if it would be ok to simply 'source dsmenv' from a shell for each instance of the mmbackup we fire up, in addition to setting up the other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described on man page. * what about the restore side of things? Most mm* commands can only be executed by root. Should we still have to rely on the TSM BA Client (dsmc|dsmj) if unprivileged users want to restore their own stuff? I guess I'll have to conduct more experiments. > > Please also review this : > > http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf > This is pretty good, as a high level overview. Much better than a few others I've seen with the release of the Spectrum Suite, since it focus entirely on GPFS/TSM/backup|(HSM). It would be nice to have some typical implementation examples. Thanks a lot for the references Yaron, and again thanks for any further comments. Jaime > > > Regards > > > > > > Yaron Daniel > 94 Em Ha'Moshavot Rd > > Server, Storage and Data Services - Team Leader > Petach Tiqva, 49527 > Global Technology Services > Israel > Phone: > +972-3-916-5672 > > > Fax: > +972-3-916-5672 > > > Mobile: > +972-52-8395593 > > > e-mail: > yard at il.ibm.com > > > IBM Israel > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: > >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 03/09/2016 09:56 PM >> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >> scripts) vs. TSM(backup) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> Here is another area where I've been reading material from several >> sources for years, and in fact trying one solution over the other from >> time-to-time in a test environment. However, to date I have not been >> able to find a one-piece-document where all these different IBM >> alternatives for backup are discussed at length, with the pos and cons >> well explained, along with the how-to's. >> >> I'm currently using TSM(built-in backup client), and over the years I >> developed a set of tricks to rely on disk based volumes as >> intermediate cache, and multiple backup client nodes, to split the >> load and substantially improve the performance of the backup compared >> to when I first deployed this solution. However I suspect it could >> still be improved further if I was to apply tools from the GPFS side >> of the equation. >> >> I would appreciate any comments/pointers. >> >> Thanks >> Jaime >> >> >> >> >> >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu Mar 10 08:17:18 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 10 Mar 2016 09:17:18 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> Message-ID: <201603100817.u2A8HLXK012633@d06av02.portsmouth.uk.ibm.com> Hi Jaime, I received the same request from other customers as well. could you please open a RFE for the theme and send me the RFE ID? I will discuss it with the product management then. RFE Link: https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Date: 09.03.2016 16:22 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan : > For a write or create operation ENOSPC would make some sense. > But if the file already exists and I'm just opening for read access I > would be very confused by ENOSPC. > How should the system respond: "Sorry, I know about that file, I have it > safely stored away in HSM, but it is not available right now. Try again > later!" > > EAGAIN or EBUSY might be the closest in ordinary language... > But EAGAIN is used when a system call is interrupted and can be retried > right away... > So EBUSY? > > The standard return codes in Linux are: > > #define EPERM 1 /* Operation not permitted */ > #define ENOENT 2 /* No such file or directory */ > #define ESRCH 3 /* No such process */ > #define EINTR 4 /* Interrupted system call */ > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > #define E2BIG 7 /* Argument list too long */ > #define ENOEXEC 8 /* Exec format error */ > #define EBADF 9 /* Bad file number */ > #define ECHILD 10 /* No child processes */ > #define EAGAIN 11 /* Try again */ > #define ENOMEM 12 /* Out of memory */ > #define EACCES 13 /* Permission denied */ > #define EFAULT 14 /* Bad address */ > #define ENOTBLK 15 /* Block device required */ > #define EBUSY 16 /* Device or resource busy */ > #define EEXIST 17 /* File exists */ > #define EXDEV 18 /* Cross-device link */ > #define ENODEV 19 /* No such device */ > #define ENOTDIR 20 /* Not a directory */ > #define EISDIR 21 /* Is a directory */ > #define EINVAL 22 /* Invalid argument */ > #define ENFILE 23 /* File table overflow */ > #define EMFILE 24 /* Too many open files */ > #define ENOTTY 25 /* Not a typewriter */ > #define ETXTBSY 26 /* Text file busy */ > #define EFBIG 27 /* File too large */ > #define ENOSPC 28 /* No space left on device */ > #define ESPIPE 29 /* Illegal seek */ > #define EROFS 30 /* Read-only file system */ > #define EMLINK 31 /* Too many links */ > #define EPIPE 32 /* Broken pipe */ > #define EDOM 33 /* Math argument out of domain of func */ > #define ERANGE 34 /* Math result not representable */ > > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From konstantin.arnold at unibas.ch Thu Mar 10 08:56:01 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Thu, 10 Mar 2016 09:56:01 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> Message-ID: <56E136A1.8020202@unibas.ch> Hi Jaime, ... maybe I can give some comments with experience from the field: I would suggest, after reaching a high-watermark threshold, the recall speed could be throttled to a rate that is lower than migration speed (but still high enough to not run into a timeout). I don't think it's a good idea to send access denied while trying to prioritize migration. If non-IT people would see this message they could think the system is broken. It would be unclear what a batch job would do that has to prepare data, in the worst case processing would start with incomplete data. We are currently recalling all out data on tape to be moved to a different system. There is 15x more data on tape than what would fit on the disk pool (and there are millions of files before we set inode quota to a low number). We are moving user/project after an other by using tape ordered recalls. For that we had to disable a policy that was aggressively pre-migrating files and allowed to quickly free space on the disk pool. I must admit that it took us a while of tuning thresholds and policies. Best Konstantin On 03/09/2016 01:12 PM, Jaime Pinto wrote: > Yes! A behavior along those lines would be desirable. Users understand > very well what it means for a file system to be near full. > > Are there any customers already doing something similar? > > Thanks > Jaime > > Quoting Dominic Mueller-Wicke01 : > >> >> Hi Jamie, >> >> I see. So, the recall-shutdown would be something for a short time >> period. >> right? Just for the time it takes to migrate files out and free space. If >> HSM would allow the recall-shutdown, the impact for the users would be >> that >> each access to migrated files would lead to an access denied error. Would >> that be acceptable for the users? >> >> Greetings, Dominic. >> >> ______________________________________________________________________________________________________________ >> >> >> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >> Lead | >> +49 7034 64 32794 | dominic.mueller at de.ibm.com >> >> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >> Wittkopp >> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >> HRB 243294 >> >> >> >> From: Jaime Pinto >> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 08.03.2016 21:38 >> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >> >> >> >> Thanks for the suggestions Dominic >> >> I remember playing around with premigrated files at the time, and that >> was not satisfactory. >> >> What we are looking for is a configuration based parameter what will >> basically break out of the "transparency for the user" mode, and not >> perform any further recalling, period, if|when the file system >> occupancy is above a certain threshold (98%). We would not mind if >> instead gpfs would issue a preemptive "disk full" error message to any >> user/app/job relying on those files to be recalled, so migration on >> demand will have a chance to be performance. What we prefer is to swap >> precedence, ie, any migration requests would be executed ahead of any >> recalls, at least until a certain amount of free space on the file >> system has been cleared. >> >> It's really important that this type of feature is present, for us to >> reconsider the TSM version of HSM as a solution. It's not clear from >> the manual that this can be accomplish in some fashion. >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> >>> Hi, >>> >>> in all cases a recall request will be handled transparent for the >>> user at >>> the time a migrated files is accessed. This can't be prevented and has >> two >>> down sides: a) the space used in the file system increases and b) random >>> access to storage media in the Spectrum Protect server happens. With >> newer >>> versions of Spectrum Protect for Space Management a so called tape >>> optimized recall method is available that can reduce the impact to the >>> system (especially Spectrum Protect server). >>> If the problem was that the file system went out of space at the time >>> the >>> recalls came in I would recommend to reduce the threshold settings for >> the >>> file system and increase the number of premigrated files. This will >>> allow >>> to free space very quickly if needed. If you didn't use the policy based >>> threshold migration so far I recommend to use it. This method is >>> significant faster compared to the classical HSM based threshold >> migration >>> approach. >>> >>> Greetings, Dominic. >>> >>> >> ______________________________________________________________________________________________________________ >> >> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead >> | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>> Stuttgart, >>> HRB 243294 >>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >> 18:21 >>> ----- >>> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> >>> Date: 08.03.2016 17:36 >>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >> priority >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I'm wondering whether the new version of the "Spectrum Suite" will >>> allow us set the priority of the HSM migration to be higher than >>> staging. >>> >>> >>> I ask this because back in 2011 when we were still using Tivoli HSM >>> with GPFS, during mixed requests for migration and staging operations, >>> we had a very annoying behavior in which the staging would always take >>> precedence over migration. The end-result was that the GPFS would fill >>> up to 100% and induce a deadlock on the cluster, unless we identified >>> all the user driven stage requests in time, and killed them all. We >>> contacted IBM support a few times asking for a way fix this, and were >>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>> for this reason, although performance was also a consideration (more >>> to this on another post). >>> >>> We are now reconsidering HSM for a new deployment, however only if >>> this issue has been resolved (among a few others). >>> >>> What has been some of the experience out there? >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Thu Mar 10 10:55:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 05:55:21 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <56E136A1.8020202@unibas.ch> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com> <20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca> <201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <56E136A1.8020202@unibas.ch> Message-ID: <20160310055521.85234y7d2m6c97kp@support.scinet.utoronto.ca> Quoting Konstantin Arnold : > Hi Jaime, > > ... maybe I can give some comments with experience from the field: > I would suggest, after reaching a high-watermark threshold, the recall > speed could be throttled to a rate that is lower than migration speed > (but still high enough to not run into a timeout). I don't think it's a > good idea to send access denied while trying to prioritize migration. If > non-IT people would see this message they could think the system is > broken. It would be unclear what a batch job would do that has to > prepare data, in the worst case processing would start with incomplete data. I wouldn't object to any strategy that lets us empty the vase quicker than it's being filled. It may just make the solution more complex for developers, since this feels a lot like a mini-scheduler. On the other hand I don't see much of an issue for non-IT people or batch jobs depending on the data to be recalled: we already enable quotas on our file systems. When quotas are reached the system is supposed to "break" anyway, for that particular user|group or application, and they still have to handle this situation properly. > > We are currently recalling all out data on tape to be moved to a > different system. There is 15x more data on tape than what would fit on > the disk pool (and there are millions of files before we set inode quota > to a low number). We are moving user/project after an other by using > tape ordered recalls. For that we had to disable a policy that was > aggressively pre-migrating files and allowed to quickly free space on > the disk pool. I must admit that it took us a while of tuning thresholds > and policies. That is certainly an approach to consider. We still think the application should be able to properly manage occupancy on the same file system. We run a different system which has a disk based cache layer as well, and the strategy is to keep it as full as possible (85-90%), so to avoid retrieving data from tape whenever possible, while still leaving some cushion for newly saved data. Indeed finding the sweet spot is a balancing act. Thanks for the feedback Jaime > > Best > Konstantin > > > > On 03/09/2016 01:12 PM, Jaime Pinto wrote: >> Yes! A behavior along those lines would be desirable. Users understand >> very well what it means for a file system to be near full. >> >> Are there any customers already doing something similar? >> >> Thanks >> Jaime >> >> Quoting Dominic Mueller-Wicke01 : >> >>> >>> Hi Jamie, >>> >>> I see. So, the recall-shutdown would be something for a short time >>> period. >>> right? Just for the time it takes to migrate files out and free space. If >>> HSM would allow the recall-shutdown, the impact for the users would be >>> that >>> each access to migrated files would lead to an access denied error. Would >>> that be acceptable for the users? >>> >>> Greetings, Dominic. >>> >>> ______________________________________________________________________________________________________________ >>> >>> >>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>> Lead | >>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>> >>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>> Wittkopp >>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, >>> HRB 243294 >>> >>> >>> >>> From: Jaime Pinto >>> To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 08.03.2016 21:38 >>> Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>> >>> >>> >>> Thanks for the suggestions Dominic >>> >>> I remember playing around with premigrated files at the time, and that >>> was not satisfactory. >>> >>> What we are looking for is a configuration based parameter what will >>> basically break out of the "transparency for the user" mode, and not >>> perform any further recalling, period, if|when the file system >>> occupancy is above a certain threshold (98%). We would not mind if >>> instead gpfs would issue a preemptive "disk full" error message to any >>> user/app/job relying on those files to be recalled, so migration on >>> demand will have a chance to be performance. What we prefer is to swap >>> precedence, ie, any migration requests would be executed ahead of any >>> recalls, at least until a certain amount of free space on the file >>> system has been cleared. >>> >>> It's really important that this type of feature is present, for us to >>> reconsider the TSM version of HSM as a solution. It's not clear from >>> the manual that this can be accomplish in some fashion. >>> >>> Thanks >>> Jaime >>> >>> Quoting Dominic Mueller-Wicke01 : >>> >>>> >>>> >>>> Hi, >>>> >>>> in all cases a recall request will be handled transparent for the >>>> user at >>>> the time a migrated files is accessed. This can't be prevented and has >>> two >>>> down sides: a) the space used in the file system increases and b) random >>>> access to storage media in the Spectrum Protect server happens. With >>> newer >>>> versions of Spectrum Protect for Space Management a so called tape >>>> optimized recall method is available that can reduce the impact to the >>>> system (especially Spectrum Protect server). >>>> If the problem was that the file system went out of space at the time >>>> the >>>> recalls came in I would recommend to reduce the threshold settings for >>> the >>>> file system and increase the number of premigrated files. This will >>>> allow >>>> to free space very quickly if needed. If you didn't use the policy based >>>> threshold migration so far I recommend to use it. This method is >>>> significant faster compared to the classical HSM based threshold >>> migration >>>> approach. >>>> >>>> Greetings, Dominic. >>>> >>>> >>> ______________________________________________________________________________________________________________ >>> >>> >>>> >>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical >>>> Lead >>> | >>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com >>>> >>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk >>>> Wittkopp >>>> Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht >>>> Stuttgart, >>>> HRB 243294 >>>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016 >>> 18:21 >>>> ----- >>>> >>>> From: Jaime Pinto >>>> To: gpfsug main discussion list >>>> >>>> Date: 08.03.2016 17:36 >>>> Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration >>> priority >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I'm wondering whether the new version of the "Spectrum Suite" will >>>> allow us set the priority of the HSM migration to be higher than >>>> staging. >>>> >>>> >>>> I ask this because back in 2011 when we were still using Tivoli HSM >>>> with GPFS, during mixed requests for migration and staging operations, >>>> we had a very annoying behavior in which the staging would always take >>>> precedence over migration. The end-result was that the GPFS would fill >>>> up to 100% and induce a deadlock on the cluster, unless we identified >>>> all the user driven stage requests in time, and killed them all. We >>>> contacted IBM support a few times asking for a way fix this, and were >>>> told it was built into TSM. Back then we gave up IBM's HSM primarily >>>> for this reason, although performance was also a consideration (more >>>> to this on another post). >>>> >>>> We are now reconsidering HSM for a new deployment, however only if >>>> this issue has been resolved (among a few others). >>>> >>>> What has been some of the experience out there? >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.org >>>> University of Toronto >>>> 256 McCaul Street, Room 235 >>>> Toronto, ON, M5T1W5 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.org >> University of Toronto >> 256 McCaul Street, Room 235 >> Toronto, ON, M5T1W5 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu Mar 10 11:17:41 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 06:17:41 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca> <201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Here is some feedback on the use of mmbackup: Last night I decided to test mmbackup again, in the simplest syntax call possible (see below), and it ran like a charm! We have a 15TB GPFS with some 41 million files, running gpfs v 3.5; it certainty behaved better than what I remember when I last tried this under 3.3 or 3.2, however I still didn't specify a snapshot. I guess it didn't really matter. My idea of sourcing the dsmenv file normally used by the TSM BA client before starting mmbackup was just what I needed to land the backup material in the same pool and using the same policies normally used by the TSM BA client for this file system. For my surprise, mmbackup was smart enough to query the proper TSM database for all files already there and perform the incremental backup just as the TSM client would on its own. The best of all: it took just under 7 hours, while previously the TSM client was taking over 27 hours: that is nearly 1/4 of the time, using the same node! This is really good, since now I can finally do a true *daily* backup of this FS, so I'll refining and adopting this process moving forward, possibly adding a few more nodes as traversing helpers. Cheers Jaime [root at gpc-f114n016 bin]# mmbackup /sysadmin -t incremental -s /tmp -------------------------------------------------------- mmbackup: Backup of /sysadmin begins at Wed Mar 9 19:45:27 EST 2016. -------------------------------------------------------- Wed Mar 9 19:45:48 2016 mmbackup:Could not restore previous shadow file from TSM server TAPENODE Wed Mar 9 19:45:48 2016 mmbackup:Querying files currently backed up in TSM server:TAPENODE. Wed Mar 9 21:55:59 2016 mmbackup:Built query data file from TSM server: TAPENODE rc = 0 Wed Mar 9 21:56:01 2016 mmbackup:Scanning file system sysadmin Wed Mar 9 23:47:53 2016 mmbackup:Reconstructing previous shadow file /sysadmin/.mmbackupShadow.1.TAPENODE from query data for TAPENODE Thu Mar 10 01:05:06 2016 mmbackup:Determining file system changes for sysadmin [TAPENODE]. Thu Mar 10 01:08:40 2016 mmbackup:changed=26211, expired=30875, unsupported=0 for server [TAPENODE] Thu Mar 10 01:08:40 2016 mmbackup:Sending files to the TSM server [26211 changed, 30875 expired]. Thu Mar 10 01:38:41 2016 mmbackup:Expiring files: 0 backed up, 15500 expired, 0 failed. Thu Mar 10 02:42:08 2016 mmbackup:Backing up files: 10428 backed up, 30875 expired, 72 failed. Thu Mar 10 02:58:40 2016 mmbackup:mmapplypolicy for Backup detected errors (rc=9). Thu Mar 10 02:58:40 2016 mmbackup:Completed policy backup run with 0 policy errors, 72 files failed, 0 severe errors, returning rc=9. Thu Mar 10 02:58:40 2016 mmbackup:Policy for backup returned 9 Highest TSM error 4 mmbackup: TSM Summary Information: Total number of objects inspected: 57086 Total number of objects backed up: 26139 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 30875 Total number of objects failed: 72 Thu Mar 10 02:58:40 2016 mmbackup:Analyzing audit log file /sysadmin/mmbackup.audit.sysadmin.TAPENODE Thu Mar 10 02:58:40 2016 mmbackup:72 files not backed up for this server. ( failed:72 ) Thu Mar 10 02:58:40 2016 mmbackup:Worst TSM exit 4 Thu Mar 10 02:58:41 2016 mmbackup:72 failures were logged. Compensating shadow database... Thu Mar 10 03:06:23 2016 mmbackup:Analysis complete. 72 of 72 failed or excluded paths compensated for in 1 pass(es). Thu Mar 10 03:09:08 2016 mmbackup:TSM server TAPENODE had 72 failures or excluded paths and returned 4. Its shadow database has been updated. Thu Mar 10 03:09:08 2016 mmbackup:Incremental backup completed with some skipped files. TSM had 0 severe errors and returned 4. See the TSM log file for more information. 72 files had errors, TSM audit logs recorded 72 errors from 1 TSM servers, 0 TSM servers skipped. exit 4 ---------------------------------------------------------- mmbackup: Backup of /sysadmin completed with some skipped files at Thu Mar 10 03:09:11 EST 2016. ---------------------------------------------------------- mmbackup: Command failed. Examine previous error messages to determine cause. Quoting Jaime Pinto : > Quoting Yaron Daniel : > >> Hi >> >> Did u use mmbackup with TSM ? >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm > > I have used mmbackup on test mode a few times before, while under gpfs > 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our > facility yet). > > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so I > never felt confident enough to deploy over the larger 4000+ clients > cluster. > > Another issue was that the version of mmbackup then would not let me > choose the client environment associated with a particular gpfs file > system, fileset or path, and the equivalent storage pool and /or policy > on the TSM side. > > With the native TSM client we can do this by configuring the dsmenv > file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so > we can keep the backups segregated on different pools/tapes if > necessary (by user, by group, by project, etc) > > The problem we all agree on is that TSM client traversing is VERY SLOW, > and can not be parallelized. I always knew that the mmbackup client was > supposed to replace the TSM client for the traversing, and then parse > the "necessary parameters" and files to the native TSM client, so it > could then take over for the remainder of the workflow. > > Therefore, the remaining problems are as follows: > * I never understood the snapshot induced lookup, and how to fix it. > Was it due to the size of our cluster or the version of GPFS? Has it > been addressed under 3.5 or 4.x series? Without the snapshot how would > mmbackup know what was already gone to backup since the previous > incremental backup? Does it check each file against what is already on > TSM to build the list of candidates? What is the experience out there? > > * In the v4r2 version of the manual for the mmbackup utility we still > don't seem to be able to determine which TSM BA Client dsmenv to use as > a parameter. All we can do is choose the --tsm-servers > TSMServer[,TSMServer...]] . I can only conclude that all the contents > of any backup on the GPFS side will always end-up on a default storage > pool and use the standard TSM policy if nothing else is done. I'm now > wondering if it would be ok to simply 'source dsmenv' from a shell for > each instance of the mmbackup we fire up, in addition to setting up the > other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described > on man page. > > * what about the restore side of things? Most mm* commands can only be > executed by root. Should we still have to rely on the TSM BA Client > (dsmc|dsmj) if unprivileged users want to restore their own stuff? > > I guess I'll have to conduct more experiments. > > > >> >> Please also review this : >> >> http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf >> > > This is pretty good, as a high level overview. Much better than a few > others I've seen with the release of the Spectrum Suite, since it focus > entirely on GPFS/TSM/backup|(HSM). It would be nice to have some > typical implementation examples. > > > > Thanks a lot for the references Yaron, and again thanks for any further > comments. > Jaime > > >> >> >> Regards >> >> >> >> >> >> Yaron Daniel >> 94 Em Ha'Moshavot Rd >> >> Server, Storage and Data Services - Team Leader >> Petach Tiqva, 49527 >> Global Technology Services >> Israel >> Phone: >> +972-3-916-5672 >> >> >> Fax: >> +972-3-916-5672 >> >> >> Mobile: >> +972-52-8395593 >> >> >> e-mail: >> yard at il.ibm.com >> >> >> IBM Israel >> >> >> >> >> >> >> >> gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM: >> >>> From: Jaime Pinto >>> To: gpfsug main discussion list >>> Date: 03/09/2016 09:56 PM >>> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup >>> scripts) vs. TSM(backup) >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> Here is another area where I've been reading material from several >>> sources for years, and in fact trying one solution over the other from >>> time-to-time in a test environment. However, to date I have not been >>> able to find a one-piece-document where all these different IBM >>> alternatives for backup are discussed at length, with the pos and cons >>> well explained, along with the how-to's. >>> >>> I'm currently using TSM(built-in backup client), and over the years I >>> developed a set of tricks to rely on disk based volumes as >>> intermediate cache, and multiple backup client nodes, to split the >>> load and substantially improve the performance of the backup compared >>> to when I first deployed this solution. However I suspect it could >>> still be improved further if I was to apply tools from the GPFS side >>> of the equation. >>> >>> I would appreciate any comments/pointers. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>> >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.org >>> University of Toronto >>> 256 McCaul Street, Room 235 >>> Toronto, ON, M5T1W5 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Thu Mar 10 12:00:09 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 10 Mar 2016 12:00:09 +0000 Subject: [gpfsug-discuss] systemd Message-ID: So just picking up this from Feb 2015, have been doing some upgrades to 4.2.0.1, and see that there is now systemd support as part of this... Now I just need to unpick the local hacks we put into the init script (like wait for IB to come up) and implement those as proper systemd deps I guess. Thanks for sorting this though IBM! Simon On 10/02/2015, 15:17, "gpfsug-discuss-bounces at gpfsug.org on behalf of Simon Thompson (Research Computing - IT Services)" wrote: >Does any one have a systemd manifest for GPFS which they would share? > >As RedHat EL 7 is now using systemd and Ubuntu is now supported with >4.1p5, it seems sensible for GPFS to have systemd support. > >We're testing some services running off gpfs and it would be useful to >have a manifest so we can make the services dependent on gpfs being up >before they start. > >Or any suggestions on making systemd services dependent on a SysV script? > >Thanks > >Simon >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at gpfsug.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu Mar 10 14:46:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 10 Mar 2016 09:46:12 -0500 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com><20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <20160310061741.91687adp8pr6po0l@support.scinet.utoronto.ca> Message-ID: <201603101446.u2AEkJPP018456@d01av02.pok.ibm.com> Jaime, Thanks for the positive feedback and success story on mmbackup. We need criticism to keep improving the product - but we also need encouragement to know we are heading in the right direction and making progress. BTW - (depending on many factors) you may be able to save some significant backup time by running over multiple nodes with the -N option. --marc. (I am Mr. mmapplypolicy and work with Mr. mmbackup.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri Mar 11 00:15:49 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 10 Mar 2016 19:15:49 -0500 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> Message-ID: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From s.m.killen at leeds.ac.uk Fri Mar 11 13:19:41 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:19:41 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Message-ID: <56E2C5ED.8060500@leeds.ac.uk> Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From stschmid at de.ibm.com Fri Mar 11 13:41:54 2016 From: stschmid at de.ibm.com (Stefan Schmidt) Date: Fri, 11 Mar 2016 14:41:54 +0100 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> The message means following and is a warning without an direct affect to the function but an indicator that something is may wrong with the enclosure. Check the maintenance procedure which is shown for the event in the GUI event panel. /** Ambient temperature of power supply "{0}" undercut the lower warning threshold at {1}. */ MS0297("MS0297W",'W'), "Cause": "If the lower warning threshold is undercut a the device operation should not be affected. However this might indicate a hardware defect.", "User_action": "Follow the maintenance procedure for the enclosure.", "code": "MS0297", "description": "Ambient temperature of power supply \"{0}\" undercut the lower warning threshold at {1}.", Mit freundlichen Gr??en / Kind regards Stefan Schmidt Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. M069 / IBM Spectrum Scale Software Development IBM Systems Group IBM Deutschland Phone: +49-6131-84-3465 IBM Deutschland Mobile: +49-170-6346601 Hechtsheimer Str. 2 E-Mail: stschmid at de.ibm.com 55131 Mainz Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.m.killen at leeds.ac.uk Fri Mar 11 13:50:58 2016 From: s.m.killen at leeds.ac.uk (Sean Killen) Date: Fri, 11 Mar 2016 13:50:58 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> References: <56E2C5ED.8060500@leeds.ac.uk> <201603111342.u2BDg3Jn003896@d06av03.portsmouth.uk.ibm.com> Message-ID: <0D6C2DBC-4B82-4038-83C0-B0255C8DF9E0@leeds.ac.uk> Hi Stefan Thanks for the quick reply, I appear to have mistyped the error.. It's MS0279. See attached png. -- Sean --? ------------------------------------------------------------------- ??? Dr Sean M Killen ??? UNIX Support Officer, IT ??? Faculty of Biological Sciences ??? University of Leeds ??? LEEDS ??? LS2 9JT ??? United Kingdom ??? Tel: +44 (0)113 3433148 ??? Mob: +44 (0)776 8670907 ??? Fax: +44 (0)113 3438465 ??? GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- On 11 March 2016 13:41:54 GMT+00:00, Stefan Schmidt wrote: >The message means following and is a warning without an direct affect >to >the function but an indicator that something is may wrong with the >enclosure. Check the maintenance procedure which is shown for the event >in >the GUI event panel. > >/** Ambient temperature of power supply "{0}" undercut the lower >warning >threshold at {1}. */ > MS0297("MS0297W",'W'), > "Cause": "If the lower warning threshold is undercut a the >device operation should not be affected. However this might indicate a >hardware defect.", > "User_action": "Follow the maintenance procedure for the >enclosure.", > "code": "MS0297", > "description": "Ambient temperature of power supply \"{0}\" >undercut the lower warning threshold at {1}.", > > >Mit freundlichen Gr??en / Kind regards > >Stefan Schmidt > >Scrum Master IBM Spectrum Scale GUI / Senior IT Architect /PMP - Dept. >M069 / IBM Spectrum Scale Software Development >IBM Systems Group >IBM Deutschland > > > >Phone: >+49-6131-84-3465 > IBM Deutschland > >Mobile: >+49-170-6346601 > Hechtsheimer Str. 2 >E-Mail: >stschmid at de.ibm.com > 55131 Mainz > > > Germany > > >IBM Deutschland Research & Development GmbH / Vorsitzende des >Aufsichtsrats: Martina Koederitz >Gesch?ftsf?hrung: Dirk Wittkopp >Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht >Stuttgart, >HRB 243294 > > > > > > >------------------------------------------------------------------------ > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot-AstburyBSL.absl.prv - Dashboard - Mozilla Firefox.png Type: image/png Size: 144612 bytes Desc: not available URL: From sophie.carsten at uk.ibm.com Fri Mar 11 13:53:36 2016 From: sophie.carsten at uk.ibm.com (Sophie Carsten) Date: Fri, 11 Mar 2016 13:53:36 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111355.u2BDtMBO007426@d06av12.portsmouth.uk.ibm.com> Hi, In terms of the NSDs, you need to run the nsd devices script if they're not in /dev/dmX-, here's the link to the knowledge center: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_nsdpro.htm?lang=en The installer should work as normal after this script has been run. We were hoping to get this solved in the upcoming version of the installer, so the user doesn't have to manually run the script. But the previous install team has been put on a new project in IBM, and I can't really comment any longer on when this could be expected to be delivered by the new team put in place. Hope the link gets you further off the ground though. Sophie Carsten IBM Spectrum Virtualize Development Engineer IBM Systems - Manchester Lab 44-161-9683886 sophie.carsten at uk.ibm.com From: Sean Killen To: gpfsug main discussion list Date: 11/03/2016 13:20 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, So I have finally got my SpectrumScale system installed (well half of it). But it wasn't without some niggles. We have purchased DELL MD3860i disk trays with dual controllers (each with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a while getting a nice multipath setup in place with 'friendly' names set /dev/mapper/ssd1_1 /dev/mapper/t1d1_1 /dev/mapper/t2d1_1 etc, to represent the different tiers/disks/luns. We used the install toolkit and added all the NSDs with the friendly names and it all checked out and verified........ UNTIL we tried to install/deploy! At which point it said, no valid devices in /proc/partitions (I need to use the unfriendly /dev/dm-X name instead) - did I miss something in the toolkit, or is something that needs to be resolved, surely it should have told me when I added the first of the 36 NSDs rather that at the install stage when I then need to correct 36 errors. Secondly, I have installed the GUI, it is constantly complaining of a 'Critical' event MS0297 - Connection failed to node. Wrong Credentials. But all nodes can connect to each other via SSH without passwords. Anyone know how to clear and fix this error; I cannot find anything in the docs! Thanks -- Sean -- ------------------------------------------------------------------- Dr Sean M Killen UNIX Support Officer, IT Faculty of Biological Sciences University of Leeds LEEDS LS2 9JT United Kingdom Tel: +44 (0)113 3433148 Mob: +44 (0)776 8670907 Fax: +44 (0)113 3438465 GnuPG Key ID: ee0d36f0 ------------------------------------------------------------------- [attachment "signature.asc" deleted by Sophie Carsten/UK/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 11422 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6016 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Fri Mar 11 14:30:24 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Mar 2016 14:30:24 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 Message-ID: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Mar 11 15:27:20 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Mar 2016 10:27:20 -0500 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <201603111522.u2BFMqvG008617@d01av05.pok.ibm.com> You may need/want to set up an nsddevices script to help GPFS find all your disks. Google it! Or ... http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_nsddevices.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Fri Mar 11 15:46:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 15:46:39 +0000 Subject: [gpfsug-discuss] Niggles in the 4.2.0 Install In-Reply-To: <56E2C5ED.8060500@leeds.ac.uk> References: <56E2C5ED.8060500@leeds.ac.uk> Message-ID: <1457711199.4251.245.camel@buzzard.phy.strath.ac.uk> On Fri, 2016-03-11 at 13:19 +0000, Sean Killen wrote: > Hi all, > > So I have finally got my SpectrumScale system installed (well half of > it). But it wasn't without some niggles. > > We have purchased DELL MD3860i disk trays with dual controllers (each > with 2x 10Gbit NICs), to Linux this appears as 4 paths, I spent quite a > while getting a nice multipath setup in place with 'friendly' names set > Oh dear. I guess it might work with 10Gb Ethernet but based on my personal experience iSCSI is spectacularly unsuited to GPFS. Either your NSD nodes can overwhelm the storage arrays or the storage arrays can overwhelm the NSD servers and performance falls through the floor. That is unless you have Data Center Ethernet at which point you might as well have gone Fibre Channel in the first place. Though unless you are going to have large physical separation between the storage and NSD servers 12Gb SAS is a cheaper option and you can still have four NSD servers hooked up to each MD3 based storage array. I have in the past implement GPFS on Dell MD3200i's. I did eventually get it working reliably but it was so suboptimal with so many compromises that as soon as the MD3600f came out we purchased these to replaced the MD3200i's. Lets say you have three storage arrays with two paths to each controller and four NSD servers. Basically what happens is that an NSD server issues a bunch of requests for blocks to the storage arrays. Then all 12 paths start answering to your two connections to the NSD server. At this point the Ethernet adaptors on your NSD servers are overwhelmed 802.1D PAUSE frames start being issued which just result in head of line blocking and performance falls through the floor. You need Data Center Ethernet to handle this properly, which is probably why FCoE never took off as you can't just use the Ethernet switches and adaptors you have. Both FC and SAS handle this sort of congestion gracefully unlike ordinary Ethernet. Now the caveat for all this is that it is much easier to overwhelm a 1Gbps link than a 10Gbps link. However with the combination of SSD and larger cache's I can envisage that a 10Gbps link could be overwhelmed and you would then see the same performance issues that I saw. Basically the only way out is a one to one correspondence between ports on the NSD's and the storage controllers. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Fri Mar 11 15:46:46 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 15:46:46 +0000 Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 In-Reply-To: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> References: <1AD12A69-0EC6-4892-BB45-F8AC3CC74BDB@nuance.com> Message-ID: Hi Bob, But on the plus side, I noticed in the release notes: "If you are coming from 4.1.1-X, you must first upgrade to 4.2.0-0. You may use this 4.2.0-2 package to perform a First Time Install or to upgrade from an existing 4.2.0-X level." So it looks like its no longer necessary to install 4.2.0 and then apply PTFs. I remember talking to someone a while ago and they were hoping this might happen, but it seems that it actually has! Nice! Simon From: > on behalf of "Oesterlin, Robert" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Friday, 11 March 2016 at 14:30 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SpectrumScale 4.2.0-2 is out and STILL NO pmsensors-4.2.0-2.el6 I see this fix is out and IBM still is not providing the pmsensors package for RH6? can we PLEASE get this package posted as part of the normal distribution? Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominic.mueller at de.ibm.com Fri Mar 11 16:02:37 2016 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Fri, 11 Mar 2016 17:02:37 +0100 Subject: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority In-Reply-To: <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> References: <201603081746.u28HkHaj010808@d06av12.portsmouth.uk.ibm.com><20160308153852.61184hwsk76jmt7g@support.scinet.utoronto.ca><201603090936.u299a0SH015224@d06av02.portsmouth.uk.ibm.com> <20160309071208.1722107mgrewcpaw@support.scinet.utoronto.ca> <201603091501.u29F1DbL009860@d01av05.pok.ibm.com> <20160309102153.174547pnrz8zny4x@support.scinet.utoronto.ca> <201603100817.u2A8HLcd019753@d06av04.portsmouth.uk.ibm.com> <20160310191549.20137ilh6fuiqss5@support.scinet.utoronto.ca> Message-ID: <201603111502.u2BF2kk6007636@d06av10.portsmouth.uk.ibm.com> Jaime, found the RFE and will discuss it with product management. Greetings, Dominic. ______________________________________________________________________________________________________________ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.mueller at de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto To: Dominic Mueller-Wicke01/Germany/IBM at IBMDE Cc: gpfsug main discussion list , Marc A Kaplan Date: 11.03.2016 01:15 Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 : > > Hi Jaime, > > I received the same request from other customers as well. > could you please open a RFE for the theme and send me the RFE ID? I will > discuss it with the product management then. RFE Link: > https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding&BRAND_ID=0&PROD_ID=360&x=11&y=12 > > Greetings, Dominic. > > ______________________________________________________________________________________________________________ > > Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | > +49 7034 64 32794 | dominic.mueller at de.ibm.com > > Vorsitzende des Aufsichtsrats: Martina Koederitz; Gesch?ftsf?hrung: Dirk > Wittkopp > Sitz der Gesellschaft: B?blingen; Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > > > From: Jaime Pinto > To: gpfsug main discussion list , > Marc A Kaplan > Cc: Dominic Mueller-Wicke01/Germany/IBM at IBMDE > Date: 09.03.2016 16:22 > Subject: Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration > priority > > > > Interesting perspective Mark. > > I'm inclined to think EBUSY would be more appropriate. > > Jaime > > Quoting Marc A Kaplan : > >> For a write or create operation ENOSPC would make some sense. >> But if the file already exists and I'm just opening for read access I >> would be very confused by ENOSPC. >> How should the system respond: "Sorry, I know about that file, I have it >> safely stored away in HSM, but it is not available right now. Try again >> later!" >> >> EAGAIN or EBUSY might be the closest in ordinary language... >> But EAGAIN is used when a system call is interrupted and can be retried >> right away... >> So EBUSY? >> >> The standard return codes in Linux are: >> >> #define EPERM 1 /* Operation not permitted */ >> #define ENOENT 2 /* No such file or directory */ >> #define ESRCH 3 /* No such process */ >> #define EINTR 4 /* Interrupted system call */ >> #define EIO 5 /* I/O error */ >> #define ENXIO 6 /* No such device or address */ >> #define E2BIG 7 /* Argument list too long */ >> #define ENOEXEC 8 /* Exec format error */ >> #define EBADF 9 /* Bad file number */ >> #define ECHILD 10 /* No child processes */ >> #define EAGAIN 11 /* Try again */ >> #define ENOMEM 12 /* Out of memory */ >> #define EACCES 13 /* Permission denied */ >> #define EFAULT 14 /* Bad address */ >> #define ENOTBLK 15 /* Block device required */ >> #define EBUSY 16 /* Device or resource busy */ >> #define EEXIST 17 /* File exists */ >> #define EXDEV 18 /* Cross-device link */ >> #define ENODEV 19 /* No such device */ >> #define ENOTDIR 20 /* Not a directory */ >> #define EISDIR 21 /* Is a directory */ >> #define EINVAL 22 /* Invalid argument */ >> #define ENFILE 23 /* File table overflow */ >> #define EMFILE 24 /* Too many open files */ >> #define ENOTTY 25 /* Not a typewriter */ >> #define ETXTBSY 26 /* Text file busy */ >> #define EFBIG 27 /* File too large */ >> #define ENOSPC 28 /* No space left on device */ >> #define ESPIPE 29 /* Illegal seek */ >> #define EROFS 30 /* Read-only file system */ >> #define EMLINK 31 /* Too many links */ >> #define EPIPE 32 /* Broken pipe */ >> #define EDOM 33 /* Math argument out of domain of func */ >> #define ERANGE 34 /* Math result not representable */ >> >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.org > University of Toronto > 256 McCaul Street, Room 235 > Toronto, ON, M5T1W5 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From damir.krstic at gmail.com Fri Mar 11 20:55:29 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 20:55:29 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 Message-ID: What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Mar 11 21:10:14 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 11 Mar 2016 21:10:14 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: That looks pretty much like the right process. Check that all the components upgraded ... rpm -qa | grep gpfs You may need to do an rpm -e on the gpfs.gplbin package and then install the newly built one Are you doing make rpm to build the rpm version of gpfs.gplbin and installing that? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [damir.krstic at gmail.com] Sent: 11 March 2016 20:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 What is the correct procedure to upgrade from 3.5 to 4.1? What I have tried is uninstalling existing 3.5 version (rpm -e) and installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled kernel extensions: cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages Rebooted the node and have been getting: daemon and kernel extension do not match. I've tried rebuilding extensions again and still could not get it to work. I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting daemon and kernel extension do not match error with 3.5 version on a single node. So, couple of questions: What is the correct way of upgrading from 3.5 to 4.1.0.0? Thanks, Damir From damir.krstic at gmail.com Fri Mar 11 21:13:47 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 11 Mar 2016 21:13:47 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: Thanks for the reply. Didn't run make rpm just make autoconfig etc. Checked the versions and it all looks good and valid. Will play with it again and see if there is a step missing. Damir On Fri, Mar 11, 2016 at 15:10 Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. > > Check that all the components upgraded ... rpm -qa | grep gpfs > > You may need to do an rpm -e on the gpfs.gplbin package and then install > the newly built one > > Are you doing make rpm to build the rpm version of gpfs.gplbin and > installing that? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Damir Krstic [ > damir.krstic at gmail.com] > Sent: 11 March 2016 20:55 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs > 3.5.0-21 > > What is the correct procedure to upgrade from 3.5 to 4.1? > > What I have tried is uninstalling existing 3.5 version (rpm -e) and > installing 4.1.0.0 using rpm -hiv *.rpm. After the install I've compiled > kernel extensions: > cd /usr/lpp/mmfs/src && make Autoconfig && make World && make InstallImages > > Rebooted the node and have been getting: > daemon and kernel extension do not match. > > I've tried rebuilding extensions again and still could not get it to work. > I've uninstalled 4.1 packages and reinstalled 3.5 and I am not getting > daemon and kernel extension do not match error with 3.5 version on a single > node. So, couple of questions: > What is the correct way of upgrading from 3.5 to 4.1.0.0? > > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Fri Mar 11 22:58:08 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 11 Mar 2016 22:58:08 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: References: Message-ID: <56E34D80.7000703@buzzard.me.uk> On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > That looks pretty much like the right process. Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their derivatives), then they will now complain constantly that you have modified the RPM database outside yum. As such it is recommended by RedHat that you do "yum remove" and "yum install" rather than running rpm directly. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From pavel.pokorny at datera.cz Sat Mar 12 08:23:49 2016 From: pavel.pokorny at datera.cz (Pavel Pokorny) Date: Sat, 12 Mar 2016 09:23:49 +0100 Subject: [gpfsug-discuss] SMB and NFS limitations? Message-ID: Hello, on Spectrum Scale FAQ page I found following recommendations for SMB and NFS: *A maximum of 3,000 SMB connections is recommended per protocol node with a maximum of 20,000 SMB connections per cluster. A maximum of 4,000 NFS connections per protocol node is recommended. A maximum of 2,000 Object connections per protocol nodes is recommended.* Are there any other limits? Like max number of shares? Thanks, Pavel -- Ing. Pavel Pokorn? DATERA s.r.o. | Hadovit? 962/10 | Praha | Czech Republic www.datera.cz | Mobil: +420 602 357 194 | E-mail: pavel.pokorny at datera.cz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Mon Mar 14 14:22:20 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Mon, 14 Mar 2016 14:22:20 +0000 Subject: [gpfsug-discuss] Registration now open! Message-ID: <400eedb0a81cd193a694176794f1dc07@webmail.gpfsug.org> Dear members, The registration for the UK Spring 2016 Spectrum Scale (GPFS) User Group meeting is now open. We have a fantastic and full agenda of presentations from users and subject experts. The two-day event is taking place at the IBM Client Centre in London on 17th and 18th May. For the current agenda, further details and to register your place, please visit: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 Places at the event are limited so it is recommended that you register early to avoid disappointment. Due to capacity restrictions, there is currently a limit of three people per organisation; this will be relaxed if places remain nearer the event date. We'd like to thank our sponsors of this year's User Group as without their support the two-day event would not be possible. Thanks go to Arcastream, DDN, IBM, Lenovo, Mellanox, NetApp, OCF and Seagate for their support. We hope to see you at the May event! Best wishes, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Mar 15 19:39:51 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 15 Mar 2016 15:39:51 -0400 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? Message-ID: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From damir.krstic at gmail.com Tue Mar 15 20:31:55 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:31:55 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Tue Mar 15 20:33:35 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Tue, 15 Mar 2016 20:33:35 +0000 Subject: [gpfsug-discuss] upgrading to spectrum scale 4.1 from gpfs 3.5.0-21 In-Reply-To: <56E34D80.7000703@buzzard.me.uk> References: <56E34D80.7000703@buzzard.me.uk> Message-ID: Figured it out - this node had RedHat version of a kernel that was custom patched by RedHat some time ago for the IB issues we were experiencing. I could not build a portability layer on this kernel. After upgrading the node to more recent version of the kernel, I was able to compile portability layer and get it all working. Thanks for suggestions. Damir On Fri, Mar 11, 2016 at 4:58 PM Jonathan Buzzard wrote: > On 11/03/16 21:10, Simon Thompson (Research Computing - IT Services) wrote: > > > > That looks pretty much like the right process. > > Yes and no. Assuming you are do this on either RHEL 6.x or 7.x (or their > derivatives), then they will now complain constantly that you have > modified the RPM database outside yum. > > As such it is recommended by RedHat that you do "yum remove" and "yum > install" rather than running rpm directly. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:42:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:42:59 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: Hi Jamie I have some fairly large clusters (tho not as large as you describe) running on ?roll your own? storage subsystem of various types. You?re asking a broad question here on performance and rebuild times. I can?t speak to a comparison with ESS (I?m sure IBM can comment) but if you want to discuss some of my experiences with larger clusters, HD, performace (multi PB) I?d be happy to do so. You can drop me a note: robert.oesterlin at nuance.com and we can chat at length. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Jaime Pinto > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 2:39 PM To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=UIC7jY_blq8j34WiQM1a8cheHzbYW0sYS-ofA3if_Hk&s=MtunFkJSGpXWNdEkMqluTY-CYIC4uaMz7LiZ7JFob8c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Mar 15 20:45:05 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 15 Mar 2016 20:45:05 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Mar 15 21:50:20 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 15 Mar 2016 21:50:20 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Not sure about cluster features, but at minimum you'll need to create the filesystem with low enough mmcrfs --version string. -jf tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic : > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Mar 15 22:22:17 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 15 Mar 2016 23:22:17 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <56E88B19.4060708@unibas.ch> It's definitely doable, besides --version mentioned byJan-Frode, just a two things to consider (when cluster started as 3.5 or earlier version) we stumbled across: - keys nistCompliance=SP800-131A: we had to regenerate and exchange new keys with nistCompliance before old cluster could talk to new remotecluster - maxblocksize: you would want ESS to run with maxblocksize 16M - cluster with 3.5 probably has set a smaller value (default 1M) and to change that you have to stop GPFS Best Konstantin On 03/15/2016 10:50 PM, Jan-Frode Myklebust wrote: > Not sure about cluster features, but at minimum you'll need to create > the filesystem with low enough mmcrfs --version string. > > > > > -jf > > tir. 15. mar. 2016 kl. 21.32 skrev Damir Krstic >: > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute > cluster. When looking at GPFS coexistance documents, it is not clear > whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know > if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 > cluster? > > Thanks, > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ------------------------------------------------------------------------------------------- Konstantin Arnold | University of Basel & SIB Klingelbergstrasse 50/70 | CH-4056 Basel | Phone: +41 61 267 15 82 Email: konstantin.arnold at unibas.ch From Paul.Sanchez at deshaw.com Wed Mar 16 03:28:59 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 16 Mar 2016 03:28:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Damir Krstic > Reply-To: gpfsug main discussion list > Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 13:08:51 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 13:08:51 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: > You do have to keep an eye out for filesystem version issues as you set > this up. If the new filesystem is created with a version higher than the > 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. > > > > You can specify the version of a new filesystem at creation time with, for > example, ?mmcrfs ?version 3.5.?. > > You can confirm an existing filesystem?s version with ?mmlsfs > | grep version?. > > > > There are probably a pile of caveats about features that you can never get > on the new filesystem though. If you don?t need high-bandwidth, parallel > access to the new filesystem from the 3.5 cluster, you could use CES or > CNFS for a time, until the 3.5 cluster is upgraded or retired. > > > > A possibly better recommendation would be to upgrade the 3.5 cluster to at > least 4.1, if not 4.2, instead. It would continue to be able to serve any > of your old version filesystems, but not prohibit you from moving forward > on the new ones. > > > > -Paul > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Oesterlin, Robert > *Sent:* Tuesday, March 15, 2016 4:45 PM > > > *To:* gpfsug main discussion list > > *Subject:* Re: [gpfsug-discuss] cross-cluster mounting different versions > of gpfs > > > > I?ve never used ESS, but I state for a fact you can cross mount clusters > at various levels without a problem ? I do it all the time during upgrades. > I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may > be limited on 4.2 features when accessing it via the 3.5 cluster, but data > access should work fine. > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Damir > Krstic > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, March 15, 2016 at 3:31 PM > *To: *gpfsug main discussion list > *Subject: *[gpfsug-discuss] cross-cluster mounting different versions of > gpfs > > > > We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is > running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. > When looking at GPFS coexistance documents, it is not clear whether GPFS > 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any > issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? > > > > Thanks, > > Damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Wed Mar 16 13:29:42 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 14:29:42 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <386ecf44315f4434a5a895a0e94dca37@mbxtoa3.winmail.deshaw.com> Message-ID: <201603161329.u2GDTpjP006773@d06av09.portsmouth.uk.ibm.com> Hi, Damir, you cannot mount a 4.x fs level from a 3.5 level cluster / node. You need to create the fs with a sufficiently low level, fs level downgrade is not possible, AFAIK. 3.5 nodes can mount fs from 4.1 cluster (fs at 3.5.0.7 fs level), that I can confirm for sure. Uwe Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 02:09 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks for all replies. Do all of the same restrictions apply to 4.1? We have an option of installing ESS with 4.1. If we install ESS with 4.1 can we then cross mount to 3.5 with FS version of 4.1? Also with 4.1 are there any issues with key exchange? Thanks, Damir On Tue, Mar 15, 2016 at 10:29 PM Sanchez, Paul wrote: You do have to keep an eye out for filesystem version issues as you set this up. If the new filesystem is created with a version higher than the 3.5 cluster?s version, then the 3.5 cluster will not be able to mount it. You can specify the version of a new filesystem at creation time with, for example, ?mmcrfs ?version 3.5.?. You can confirm an existing filesystem?s version with ?mmlsfs | grep version?. There are probably a pile of caveats about features that you can never get on the new filesystem though. If you don?t need high-bandwidth, parallel access to the new filesystem from the 3.5 cluster, you could use CES or CNFS for a time, until the 3.5 cluster is upgraded or retired. A possibly better recommendation would be to upgrade the 3.5 cluster to at least 4.1, if not 4.2, instead. It would continue to be able to serve any of your old version filesystems, but not prohibit you from moving forward on the new ones. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto: gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: Tuesday, March 15, 2016 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs I?ve never used ESS, but I state for a fact you can cross mount clusters at various levels without a problem ? I do it all the time during upgrades. I?m not aware of any co-exisitance problems with the 3.5 and above. Yo may be limited on 4.2 features when accessing it via the 3.5 cluster, but data access should work fine. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Damir Krstic Reply-To: gpfsug main discussion list Date: Tuesday, March 15, 2016 at 3:31 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs We are deploying ESS with Spectrum Scale 4.2. Our compute cluster is running GPFS 3.5. We will remote cluster mount ESS to our compute cluster. When looking at GPFS coexistance documents, it is not clear whether GPFS 3.5 cluster can remote mount GPFS 4.2. Does anyone know if there are any issues in remote mounting GPFS 4.2 cluster on 3.5 cluster? Thanks, Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Wed Mar 16 15:20:50 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:20:50 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYR3X029313@d03av02.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Mar 16 15:32:51 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 10:32:51 -0500 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> Message-ID: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM raid-in-software technology with advanced striping and error recovery. I just googled some of those terms and hit this not written by IBM summary: http://www.raidinc.com/file-storage/gss-ess Also, this is now a "mature" technology. IBM has been doing this since before 2008. See pages 9 and 10 of: http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:03:27 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:03:27 +0000 Subject: [gpfsug-discuss] Perfileset df explanation Message-ID: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Wed Mar 16 16:05:48 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 16 Mar 2016 16:05:48 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:12:54 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:12:54 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A : > All, > > > > Can someone explain that this means? :: > > > > --filesetdf > > Displays a yes or no value indicating whether filesetdf is enabled; if > yes, the mmdf command reports numbers based on the quotas for the fileset > and not for the total file system. > > > > What this means, as in the output I would expect to see from mmdf with > this option set to Yes, and No? I don?t think it?s supposed to give any > indication of over-provision and cursory tests suggest it doesn?t. > > > > Thanks > > > > Richard > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Mar 16 16:13:11 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 16 Mar 2016 16:13:11 +0000 Subject: [gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters? In-Reply-To: <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> References: <20160315153951.55646e2asg3eon47@support.scinet.utoronto.ca> <201603161534.u2GFYSJo010813@d03av04.boulder.ibm.com> Message-ID: Thanks for those slides -- I hadn't realized GNR was that old. The slides projected 120 PB by 2011.. Does anybody know what the largest GPFS filesystems are today? Are there any in that area? How many ESS GLx building blocks in a single cluster? -jf ons. 16. mar. 2016 kl. 16.34 skrev Marc A Kaplan : > IBM ESS, GSS, GNR, and Perseus refer to the same "declustered" IBM > raid-in-software technology with advanced striping and error recovery. > > I just googled some of those terms and hit this not written by IBM summary: > > http://www.raidinc.com/file-storage/gss-ess > > Also, this is now a "mature" technology. IBM has been doing this since > before 2008. See pages 9 and 10 of: > > http://storageconference.us/2008/presentations/2.Tuesday/6.Haskin.pdf > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Mar 16 16:24:49 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 16 Mar 2016 16:24:49 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: Ah, I see, thanks for that. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 16 March 2016 16:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation If you have a fileset quota, 'df' will report the size of the fileset as the max quota defined, and usage as how much of the quota you have used. -jf ons. 16. mar. 2016 kl. 17.03 skrev Sobey, Richard A >: All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Wed Mar 16 17:07:28 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:07:28 -0500 Subject: [gpfsug-discuss] 4.2 installer Message-ID: <56E992D0.3050603@genome.wustl.edu> All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_migratingtoISS4.2fromISS4.1.1.htm But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? http://www.ibm.com/support/knowledgecenter/STXKQY/420/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_configuringgpfs.htm%23configuringgpfs?lang=en Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:15:02 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:15:02 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E992D0.3050603@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> Message-ID: Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Mar 16 17:18:47 2016 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 16 Mar 2016 18:18:47 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: > The key point is that you must create the file system so that is "looks" > like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a > test filesystem back on the 3.5 cluster and look at the version string. > mmslfs xxx -V. Then go to the 4.x system and try to create a file system > with the same version string.... > > > [image: Marc A Kaplan] > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Mar 16 17:20:11 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 17:20:11 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu>, Message-ID: Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 16 March 2016 17:15 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer Hi Matt I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:07 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] 4.2 installer All, Attempting to upgrade our into our dev environment. The update to 4.2 was simple. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= But I am confused on the installation toolkit. It seems that it is going to set it all up and I just want to upgrade a cluster that is already setup. Anyway to just pull in the current cluster info? https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= Thanks Matt ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= From mweil at genome.wustl.edu Wed Mar 16 17:36:26 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 12:36:26 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <56E9999A.7030902@genome.wustl.edu> We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? On 3/16/16 12:20 PM, Simon Thompson (Research Computing - IT Services) wrote: > Does the installer manage to make the rpm kernel layer ok on clone oses? > > Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... > > (I must admit I haven't used the installer, but be have config management recipes to install and upgrade). > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] > Sent: 16 March 2016 17:15 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > Hi Matt > > I?ve done a fair amount of work (testing) with the installer. It?s great if you want to install a new cluster, not so much if you have one setup. You?ll need to manually define everything. Be careful tho ? do some test runs to verify what it will really do. I?ve found the installer doing a good job in upgrading my CES nodes, but I?ve opted to manually upgrade my NSD server nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > > > > From: > on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:07 PM > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] 4.2 installer > > All, > > Attempting to upgrade our into our dev environment. The update to 4.2 > was simple. > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5FmigratingtoISS4.2fromISS4.1.1.htm&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=q0hiYoPBUTo0Rs7bnv_vhYZnrKKY1ypiub2u4Y1RzXQ&e= > > But I am confused on the installation toolkit. It seems that it is > going to set it all up and I just want to upgrade a cluster that is > already setup. Anyway to just pull in the current cluster info? > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_support_knowledgecenter_STXKQY_420_com.ibm.spectrum.scale.v4r2.ins.doc_bl1ins-5Fconfiguringgpfs.htm-2523configuringgpfs-3Flang-3Den&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=I5ateHrTk48s7jvy5fbQoN9WAx0JThpderOGCqeU05A&e= > > Thanks > Matt > > > ____ > This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=CwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=K6ADNiCE2cv7oQrkLfwkSHIv2XDj2QC_YLYYGPX53gU&s=1xogEU5qWELakYlmL5snihVa_PjAuf1KMuDM-s1e48c&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From Robert.Oesterlin at nuance.com Wed Mar 16 17:36:37 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:36:37 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> Message-ID: <2097A8FD-3A42-4D36-8DC2-1DDA6BC9984C@nuance.com> Sadly, it fails if the node can?t run mmbuildgpl, also on the clone OS?s of RedHat. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of "Simon Thompson (Research Computing - IT Services)" > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:20 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer Does the installer manage to make the rpm kernel layer ok on clone oses? Last time I tried mmmakegpl, it falls over as I don't run RedHat enterprise... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Mar 16 17:40:42 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 16 Mar 2016 17:40:42 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9999A.7030902@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> Message-ID: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: > on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Mar 16 18:07:59 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 18:07:59 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 18:47:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 18:47:06 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <56E9AA2A.3010108@buzzard.me.uk> On 16/03/16 18:07, Damir Krstic wrote: > Sven, > > For us, at least, at this point in time, we have to create new > filesystem with version flag. The reason is we can't take downtime to > upgrade all of our 500+ compute nodes that will cross-cluster mount this > new storage. We can take downtime in June and get all of the nodes up to > 4.2 gpfs version but we have users today that need to start using the > filesystem. > You can upgrade a GPFS file system piece meal. That is there should be no reason to take the whole system off-line to perform the upgrade. So you can upgrade a compute nodes to GPFS 4.2 one by one and they will happily continue to talk to the NSD's running 3.5 while the other nodes continue to use the file system. In a properly designed GPFS cluster you should also be able to take individual NSD nodes out for the upgrade. Though I wouldn't recommend running mixed versions on a long term basis, it is definitely fine for the purposes of upgrading. Then once all nodes in the GPFS cluster are upgraded you issue the mmchfs -V full. How long this will take will depend on the maximum run time you allow for your jobs. You would need to check that you can make a clean jump from 3.5 to 4.2 but IBM support should be able to confirm that for you. This is one of the nicer features of GPFS; its what I refer to as "proper enterprise big iron computing". That is if you have to take the service down at any time for any reason you are doing it wrong. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From UWEFALKE at de.ibm.com Wed Mar 16 18:51:59 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Wed, 16 Mar 2016 19:51:59 +0100 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> Message-ID: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Wed Mar 16 19:06:02 2016 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 16 Mar 2016 19:06:02 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: <56E9AA2A.3010108@buzzard.me.uk> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: Jonathan, Gradual upgrade is indeed a nice feature of GPFS. We are planning to gradually upgrade our clients to 4.2. However, before all, or even most clients are upgraded, we have to be able to mount this new 4.2 filesystem on all our compute nodes that are running version 3.5. Here is our environment today: storage cluster - 14 nsd servers * gpfs3.5 compute cluster - 500+ clients * gpfs3.5 <--- this cluster is mounting storage cluster filesystems new to us ESS cluster * gpfs4.2 ESS will become its own GPFS cluster and we want to mount its filesystems on our compute cluster. So far so good. We understand that we will eventually want to upgrade all our nodes in compute cluster to 4.2 and we know the upgrade path (3.5 --> 4.1 --> 4.2). The reason for this conversation is: with ESS and GPFS 4.2 can we remote mount it on our compute cluster? The answer we got is, yes if you build a new filesystem with --version flag. Sven, however, has just pointed out that this may not be desirable option since there are some features that are permanently lost when building a filesystem with --version. In our case, however, even though we will upgrade our clients to 4.2 (some gradually as pointed elsewhere in this conversation, and most in June), we have to be able to mount the new ESS filesystem on our compute cluster before the clients are upgraded. It seems like, even though Sven is recommending against it, building a filesystem with --version flag is our only option. I guess we have another option, and that is to upgrade all our clients first, but we can't do that until June so I guess it's really not an option at this time. I hope this makes our constraints clear: mainly, without being able to take downtime on our compute cluster, we are forced to build a filesystem on ESS using --version flag. Thanks, Damir On Wed, Mar 16, 2016 at 1:47 PM Jonathan Buzzard wrote: > On 16/03/16 18:07, Damir Krstic wrote: > > Sven, > > > > For us, at least, at this point in time, we have to create new > > filesystem with version flag. The reason is we can't take downtime to > > upgrade all of our 500+ compute nodes that will cross-cluster mount this > > new storage. We can take downtime in June and get all of the nodes up to > > 4.2 gpfs version but we have users today that need to start using the > > filesystem. > > > > You can upgrade a GPFS file system piece meal. That is there should be > no reason to take the whole system off-line to perform the upgrade. So > you can upgrade a compute nodes to GPFS 4.2 one by one and they will > happily continue to talk to the NSD's running 3.5 while the other nodes > continue to use the file system. > > In a properly designed GPFS cluster you should also be able to take > individual NSD nodes out for the upgrade. Though I wouldn't recommend > running mixed versions on a long term basis, it is definitely fine for > the purposes of upgrading. > > Then once all nodes in the GPFS cluster are upgraded you issue the > mmchfs -V full. How long this will take will depend on the maximum run > time you allow for your jobs. > > You would need to check that you can make a clean jump from 3.5 to 4.2 > but IBM support should be able to confirm that for you. > > This is one of the nicer features of GPFS; its what I refer to as > "proper enterprise big iron computing". That is if you have to take the > service down at any time for any reason you are doing it wrong. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:29:17 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:29:17 -0800 Subject: [gpfsug-discuss] cross-cluster mounting different versionsofgpfs In-Reply-To: <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <201603161852.u2GIq6Vd028321@d06av08.portsmouth.uk.ibm.com> Message-ID: <201603161929.u2GJTRRf020013@d03av01.boulder.ibm.com> There are two related, but distinctly different issues to consider. 1) File system format and backward compatibility. The format of a given file system is recorded on disk, and determines the level of code required to mount such a file system. GPFS offers backward compatibility for older file system versions stretching for many releases. The oldest file system format we test with in the lab is 2.2 (we don't believe there are file systems using older versions actually present in the field). So if you have a file system formatted using GPFS V3.5 code, you can mount that file system using GPFS V4.1 or V4.2 without a problem. Of course, you don't get to use the new features that depend on the file system format that came out since V3.5. If you're formatting a new file system on a cluster running newer code, but want that file system to be mountable by older code, you have to use --version with mmcrfs. 2) RPC format compatibility, aka nodes being able to talk to each other. As the code evolves, the format of some RPCs sent over the network to other nodes naturally has to evolve as well. This of course presents a major problem for code coexistence (running different versions of GPFS on different nodes in the same cluster, or nodes from different clusters mounting the same file system, which effectively means joining a remote cluster), which directly translates into the possibility of a rolling migration (upgrading nodes to newer GPFS level one at a time, without taking all nodes down). Implementing new features while preserving some level of RPC compatibility with older releases is Hard, but this is something GPFS has committed to, long ago. The commitment is not open-ended though, there's a very specific statement of support for what's allowed. GPFS major (meaning 'v' or 'r' is incremented in a v.r.m.f version string) release N stream shall have coexistence with the GPFS major release N - 1 stream. So coexistence of V4.2 with V4.1 is supported, while coexistence of V4.2 with older releases is unsupported (it may or may not work if one tries it, depending on the specific combination of versions, but one would do so entirely on own risk). The reason for limiting the extent of RPC compatibility is prosaic: in order to support something, we have to be able to test this something. We have the resources to test the N / N - 1 combination, for every major release N. If we had to extend this to N, N - 1, N - 2, N - 3, you can do the math on how many combinations to test that would create. That would bust the test budget. So if you want to cross-mount a file system from a home cluster running V4.2, you have to run at least V4.1.x on client nodes, and the file system would have to be formatted using the lowest version used on any node mounting the file system. Hope this clarifies things a bit. yuri From: "Uwe Falke" To: gpfsug main discussion list , Date: 03/16/2016 11:52 AM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Damir, I have not done that, but a rolling upgrade from 3.5.x to 4.1.x (maybe even to 4.2) is supported. So, as long as you do not need all 500 nodes of your compute cluster permanently active, you might upgrade them in batches without fully-blown downtime. Nicely orchestrated by some scripts it could be done quite smoothly (depending on the percentage of compute nodes which can go down at once and on the run time / wall clocks of your jobs this will take between few hours and many days ...). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Damir Krstic To: gpfsug main discussion list Date: 03/16/2016 07:08 PM Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Sent by: gpfsug-discuss-bounces at spectrumscale.org Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: while this is all correct people should think twice about doing this. if you create a filesystem with older versions, it might prevent you from using some features like data-in-inode, encryption, adding 4k disks to existing filesystem, etc even if you will eventually upgrade to the latest code. for some customers its a good point in time to also migrate to larger blocksizes compared to what they run right now and migrate the data. i have seen customer systems gaining factors of performance improvements even on existing HW by creating new filesystems with larger blocksize and latest filesystem layout (that they couldn't before due to small file waste which is now partly solved by data-in-inode). while this is heavily dependent on workload and environment its at least worth thinking about. sven On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan wrote: The key point is that you must create the file system so that is "looks" like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a test filesystem back on the 3.5 cluster and look at the version string. mmslfs xxx -V. Then go to the 4.x system and try to create a file system with the same version string.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "atthrpb5.gif" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mweil at genome.wustl.edu Wed Mar 16 19:37:31 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Wed, 16 Mar 2016 14:37:31 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> Message-ID: <56E9B5FB.2050105@genome.wustl.edu> any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an > error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != > libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to > many package dependencies. Get those setup right and the installer > does a good job. > > If you go through and define your cluster nodes to the installer, you > can do a GPFS upgrade that way. I?ve run into some issues, especially > with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a > whole lot of control over what it does ? give it a ty and it may work > well for you. But run it in a test cluster first or on a limited set > of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: > on behalf of Matt > Weil > > Reply-To: gpfsug main discussion list > > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org > " > > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volobuev at us.ibm.com Wed Mar 16 19:37:53 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 11:37:53 -0800 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: <201603161937.u2GJbwII007184@d03av04.boulder.ibm.com> The 'mmdf' part of the usage string is actually an error, it should actually say 'df'. More specifically, this changes the semantics of statfs (2). On Linux, the statfs syscall takes a path argument, which can be the root directory of a file system, or a subdirectory inside. If the path happens to be a root directory of a fileset, and that fileset has the fileset quota set, and --filesetdf is set to 'yes', the statfs returns utilization numbers based on the fileset quota utilization, as opposed to the overall file system utilization. This is useful when a specific fileset is NFS-exported as a 'share', and it's desirable to see only the space used/available for that 'share' on the NFS client side. yuri From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" , Date: 03/16/2016 09:05 AM Subject: [gpfsug-discuss] Perfileset df explanation Sent by: gpfsug-discuss-bounces at spectrumscale.org All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don?t think it?s supposed to give any indication of over-provision and cursory tests suggest it doesn?t. Thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan at buzzard.me.uk Wed Mar 16 19:45:35 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 16 Mar 2016 19:45:35 +0000 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com> <56E9AA2A.3010108@buzzard.me.uk> Message-ID: <56E9B7DF.1020007@buzzard.me.uk> On 16/03/16 19:06, Damir Krstic wrote: [SNIP] > > In our case, however, even though we will upgrade our clients to 4.2 > (some gradually as pointed elsewhere in this conversation, and most in > June), we have to be able to mount the new ESS filesystem on our compute > cluster before the clients are upgraded. What is preventing a gradual if not rapid upgrade of the compute clients now? The usual approach is once you have verified the upgrade is to simply to disable the queues on all the nodes and as jobs finish you upgrade them as they become free. Again because the usual approach is to have a maximum run time for jobs (that is jobs can't just run forever and will be culled if they run too long) you can achieve this piece meal upgrade in a relatively short period of time. Most places have a maximum run time of one to two weeks. So if you are within the norm this could be done by the end of the month. It's basically the same procedure as you would use to say push a security update that required a reboot. The really neat way is to script it up and then make it a job that you keep dumping in the queue till all nodes are updated :D > > It seems like, even though Sven is recommending against it, building a > filesystem with --version flag is our only option. I guess we have > another option, and that is to upgrade all our clients first, but we > can't do that until June so I guess it's really not an option at this time. > I would add my voice to that. The "this feature is not available because you created the file system as version x.y.z" is likely to cause you problems at some point down the line. Certainly caused me headaches in the past. > I hope this makes our constraints clear: mainly, without being able to > take downtime on our compute cluster, we are forced to build a > filesystem on ESS using --version flag. > Again there is or at least should not be *ANY* requirement for downtime of the compute cluster that the users will notice. Certainly nothing worse that nodes going down due to hardware failures or pushing urgent security patches. Taking a different tack is it not possible for the ESS storage to be added to the existing files system? That is you get a bunch of NSD's on the disk with NSD servers, add them all to the existing cluster and then issue some "mmchdisk suspend" on the existing disks followed by some "mmdeldisk " and have the whole lot move over to the new storage in an a manner utterly transparent to the end users (well other than a performance impact)? This approach certainly works (done it myself) but IBM might have placed restrictions on the ESS offering preventing you doing this while maintaining support that I am not familiar with. If there is I personally would see this a barrier to purchase of ESS but then I am old school when it comes to GPFS and not at all familiar with ESS. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Wed Mar 16 19:51:59 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Mar 2016 19:51:59 +0000 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56E9B5FB.2050105@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com>, <56E9B5FB.2050105@genome.wustl.edu> Message-ID: Have you got a half updated system maybe? You cant have: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 I.e. 0.7.3-5 and 0.7.5-4 I cant check right now, but are ibm shipping libcap-Ng as part of their package? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] Sent: 16 March 2016 19:37 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] 4.2 installer any help here? ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 Error: Multilib version problems found. This often means that the root cause is something else and multilib version checking is just pointing out that there is a problem. Eg.: 1. You have an upgrade for libcap-ng which is missing some dependency that another package requires. Yum is trying to solve this by installing an older version of libcap-ng of the different architecture. If you exclude the bad architecture yum will tell you what the root cause is (which package requires what). You can try redoing the upgrade with --exclude libcap-ng.otherarch ... this should give you an error message showing the root cause of the problem. 2. You have multiple architectures of libcap-ng installed, but yum can only see an upgrade for one of those architectures. If you don't want/need both architectures anymore then you can remove the one with the missing update and everything will work. 3. You have duplicate versions of libcap-ng installed already. You can use "yum check" to get yum show these errors. ...you can also use --setopt=protected_multilib=false to remove this checking, however this is almost never the correct thing to do as something else is very likely to go wrong (often causing much more problems). Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 On 3/16/16 12:40 PM, Oesterlin, Robert wrote: My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > Reply-To: gpfsug main discussion list > Date: Wednesday, March 16, 2016 at 12:36 PM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 installer We have multiple clusters with thousands of nsd's surely there is an upgrade path. Are you all saying just continue to manually update nsd servers and manage them as we did previously. Is the installer not needed if there are current setups. Just deploy CES manually? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From volobuev at us.ibm.com Wed Mar 16 20:03:09 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Wed, 16 Mar 2016 12:03:09 -0800 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> Message-ID: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> > Under both 3.2 and 3.3 mmbackup would always lock up our cluster when > using snapshot. I never understood the behavior without snapshot, and > the lock up was intermittent in the carved-out small test cluster, so > I never felt confident enough to deploy over the larger 4000+ clients > cluster. Back then, GPFS code had a deficiency: migrating very large files didn't work well with snapshots (and some operation mm commands). In order to create a snapshot, we have to have the file system in a consistent state for a moment, and we get there by performing a "quiesce" operation. This is done by flushing all dirty buffers to disk, stopping any new incoming file system operations at the gates, and waiting for all in-flight operations to finish. This works well when all in-flight operations actually finish reasonably quickly. That assumption was broken if an external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very large file, e.g. to migrate the file's blocks to a different storage pool. The quiesce operation would need to wait for that API call to finish, as it's an in-flight operation, but migrating a multi-TB file could take a while, and during this time all new file system ops would be blocked. This was solved several years ago by changing the API and its callers to do the migration one block range at a time, thus making each individual syscall short and allowing quiesce to barge in and do its thing. All currently supported levels of GPFS have this fix. I believe mmbackup was affected by the same GPFS deficiency and benefited from the same fix. yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Mar 16 20:20:21 2016 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 16 Mar 2016 16:20:21 -0400 Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) In-Reply-To: <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> References: <20160309145613.21201iprt1y9upy5@support.scinet.utoronto.ca><201603092017.u29KH7hm013719@d06av08.portsmouth.uk.ibm.com> <20160309163349.686071llaq6b36il@support.scinet.utoronto.ca> <201603162003.u2GK3DFj027660@d03av03.boulder.ibm.com> Message-ID: <20160316162021.57513mzxykk7semd@support.scinet.utoronto.ca> OK, that is good to know. I'll give it a try with snapshot then. We already have 3.5 almost everywhere, and planing for 4.2 upgrade (reading the posts with interest) Thanks Jaime Quoting Yuri L Volobuev : > >> Under both 3.2 and 3.3 mmbackup would always lock up our cluster when >> using snapshot. I never understood the behavior without snapshot, and >> the lock up was intermittent in the carved-out small test cluster, so >> I never felt confident enough to deploy over the larger 4000+ clients >> cluster. > > Back then, GPFS code had a deficiency: migrating very large files didn't > work well with snapshots (and some operation mm commands). In order to > create a snapshot, we have to have the file system in a consistent state > for a moment, and we get there by performing a "quiesce" operation. This > is done by flushing all dirty buffers to disk, stopping any new incoming > file system operations at the gates, and waiting for all in-flight > operations to finish. This works well when all in-flight operations > actually finish reasonably quickly. That assumption was broken if an > external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very > large file, e.g. to migrate the file's blocks to a different storage pool. > The quiesce operation would need to wait for that API call to finish, as > it's an in-flight operation, but migrating a multi-TB file could take a > while, and during this time all new file system ops would be blocked. This > was solved several years ago by changing the API and its callers to do the > migration one block range at a time, thus making each individual syscall > short and allowing quiesce to barge in and do its thing. All currently > supported levels of GPFS have this fix. I believe mmbackup was affected by > the same GPFS deficiency and benefited from the same fix. > > yuri > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From duersch at us.ibm.com Wed Mar 16 20:25:23 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 16 Mar 2016 16:25:23 -0400 Subject: [gpfsug-discuss] cross-cluster mounting different versions of gpfs In-Reply-To: References: Message-ID: Please see question 2.10 in our faq. http://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/STXKQY/gpfsclustersfaq.pdf We only support clusters that are running release n and release n-1 and release n+1. So 4.1 is supported to work with 3.5 and 4.2. Release 4.2 is supported to work with 4.1, but not with gpfs 3.5. It may indeed work, but it is not supported. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York >>Message: 1 >>Date: Wed, 16 Mar 2016 18:07:59 +0000 >>From: Damir Krstic >>To: gpfsug main discussion list >>Subject: Re: [gpfsug-discuss] cross-cluster mounting different >> versions of gpfs >>Message-ID: >> >>Content-Type: text/plain; charset="utf-8" >> >>Sven, >> >>For us, at least, at this point in time, we have to create new filesystem >>with version flag. The reason is we can't take downtime to upgrade all of >>our 500+ compute nodes that will cross-cluster mount this new storage. We >>can take downtime in June and get all of the nodes up to 4.2 gpfs version >>but we have users today that need to start using the filesystem. >> >>So at this point in time, we either have ESS built with 4.1 version and >>cross mount its filesystem (also built with --version flag I assume) to our >>3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems >>with --version flag and then in June when we get all of our clients upgrade >>we run =latest gpfs command and then mmchfs -V to get filesystem back up to >>4.2 features. >> >>It's unfortunate that we are in this bind with the downtime of the compute >>cluster. If we were allowed to upgrade our compute nodes before June, we >>could proceed with 4.2 build without having to worry about filesystem >>versions. >> >>Thanks for your reply. >> >>Damir From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 03/16/2016 02:08 PM Subject: gpfsug-discuss Digest, Vol 50, Issue 47 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: cross-cluster mounting different versions of gpfs (Damir Krstic) ---------------------------------------------------------------------- Message: 1 Date: Wed, 16 Mar 2016 18:07:59 +0000 From: Damir Krstic To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] cross-cluster mounting different versions of gpfs Message-ID: Content-Type: text/plain; charset="utf-8" Sven, For us, at least, at this point in time, we have to create new filesystem with version flag. The reason is we can't take downtime to upgrade all of our 500+ compute nodes that will cross-cluster mount this new storage. We can take downtime in June and get all of the nodes up to 4.2 gpfs version but we have users today that need to start using the filesystem. So at this point in time, we either have ESS built with 4.1 version and cross mount its filesystem (also built with --version flag I assume) to our 3.5 compute cluster, or...we proceed with 4.2 ESS and build filesystems with --version flag and then in June when we get all of our clients upgrade we run =latest gpfs command and then mmchfs -V to get filesystem back up to 4.2 features. It's unfortunate that we are in this bind with the downtime of the compute cluster. If we were allowed to upgrade our compute nodes before June, we could proceed with 4.2 build without having to worry about filesystem versions. Thanks for your reply. Damir On Wed, Mar 16, 2016 at 12:18 PM Sven Oehme wrote: > while this is all correct people should think twice about doing this. > if you create a filesystem with older versions, it might prevent you from > using some features like data-in-inode, encryption, adding 4k disks to > existing filesystem, etc even if you will eventually upgrade to the latest > code. > > for some customers its a good point in time to also migrate to larger > blocksizes compared to what they run right now and migrate the data. i have > seen customer systems gaining factors of performance improvements even on > existing HW by creating new filesystems with larger blocksize and latest > filesystem layout (that they couldn't before due to small file waste which > is now partly solved by data-in-inode). while this is heavily dependent on > workload and environment its at least worth thinking about. > > sven > > > > On Wed, Mar 16, 2016 at 4:20 PM, Marc A Kaplan > wrote: > >> The key point is that you must create the file system so that is "looks" >> like a 3.5 file system. See mmcrfs ... --version. Tip: create or find a >> test filesystem back on the 3.5 cluster and look at the version string. >> mmslfs xxx -V. Then go to the 4.x system and try to create a file system >> with the same version string.... >> >> >> [image: Marc A Kaplan] >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160316/58097bbf/attachment.gif > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 50, Issue 47 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Mar 16 21:52:34 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Mar 2016 16:52:34 -0500 Subject: [gpfsug-discuss] cross-cluster mounting different versions ofgpfs In-Reply-To: References: <201603161522.u2GFMnvo009299@d03av02.boulder.ibm.com><56E9AA2A.3010108@buzzard.me.uk> Message-ID: <201603162152.u2GLqfvD032745@d03av03.boulder.ibm.com> Considering the last few appends from Yuri and Sven, it seems you might want to (re)consider using Samba and/or NFS... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Thu Mar 17 11:14:03 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 17 Mar 2016 11:14:03 +0000 Subject: [gpfsug-discuss] Perfileset df explanation In-Reply-To: References: Message-ID: (Sorry, just found this in drafts, thought I'd sent it yesterday!) Cheers Luke. Sorry, I wasn't actually wanting to get over-provisioning stats (although it would be great!) just that I thought that might be what it does. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Luke Raimbach Sent: 16 March 2016 16:06 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Perfileset df explanation Hi Richard, I don't think mmdf will tell you the answer you're looking for. If you use df within the fileset, or for the share over NFS, you will get the free space reported for that fileset, not the whole file system. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 16 March 2016 16:03 To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Perfileset df explanation All, Can someone explain that this means? :: --filesetdf Displays a yes or no value indicating whether filesetdf is enabled; if yes, the mmdf command reports numbers based on the quotas for the fileset and not for the total file system. What this means, as in the output I would expect to see from mmdf with this option set to Yes, and No? I don't think it's supposed to give any indication of over-provision and cursory tests suggest it doesn't. Thanks Richard The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Mar 17 16:03:59 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 17 Mar 2016 16:03:59 +0000 Subject: [gpfsug-discuss] Experiences with Alluxio/Tachyon ? Message-ID: <18C8D317-16BE-4351-AD8D-0E165FB60511@nuance.com> Anyone have experience with Alluxio? http://www.alluxio.org/ Also http://ibmresearchnews.blogspot.com/2015/08/tachyon-for-ultra-fast-big-data.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Fri Mar 18 16:39:42 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:39:42 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> Message-ID: <56EC2F4E.6010203@genome.wustl.edu> upgrading to 4.2.2 fixed the dependency issue. I now get Unable to access CES shared root. # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' cesSharedRoot /vol/system On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: > Have you got a half updated system maybe? > > You cant have: > libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > I.e. 0.7.3-5 and 0.7.5-4 > > I cant check right now, but are ibm shipping libcap-Ng as part of their package? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] > Sent: 16 March 2016 19:37 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] 4.2 installer > > any help here? > ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 > Error: Multilib version problems found. This often means that the root > cause is something else and multilib version checking is just > pointing out that there is a problem. Eg.: > > 1. You have an upgrade for libcap-ng which is missing some > dependency that another package requires. Yum is trying to > solve this by installing an older version of libcap-ng of the > different architecture. If you exclude the bad architecture > yum will tell you what the root cause is (which package > requires what). You can try redoing the upgrade with > --exclude libcap-ng.otherarch ... this should give you an error > message showing the root cause of the problem. > > 2. You have multiple architectures of libcap-ng installed, but > yum can only see an upgrade for one of those architectures. > If you don't want/need both architectures anymore then you > can remove the one with the missing update and everything > will work. > > 3. You have duplicate versions of libcap-ng installed already. > You can use "yum check" to get yum show these errors. > > ...you can also use --setopt=protected_multilib=false to remove > this checking, however this is almost never the correct thing to > do as something else is very likely to go wrong (often causing > much more problems). > > Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 > > > On 3/16/16 12:40 PM, Oesterlin, Robert wrote: > My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. > > If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > > Reply-To: gpfsug main discussion list > > Date: Wednesday, March 16, 2016 at 12:36 PM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: Re: [gpfsug-discuss] 4.2 installer > > We have multiple clusters with thousands of nsd's surely there is an > upgrade path. Are you all saying just continue to manually update nsd > servers and manage them as we did previously. Is the installer not > needed if there are current setups. Just deploy CES manually? > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From mweil at genome.wustl.edu Fri Mar 18 16:54:51 2016 From: mweil at genome.wustl.edu (Matt Weil) Date: Fri, 18 Mar 2016 11:54:51 -0500 Subject: [gpfsug-discuss] 4.2 installer In-Reply-To: <56EC2F4E.6010203@genome.wustl.edu> References: <56E992D0.3050603@genome.wustl.edu> <56E9999A.7030902@genome.wustl.edu> <34AA5362-F31C-4292-AB99-BB91ECC6159E@nuance.com> <56E9B5FB.2050105@genome.wustl.edu> <56EC2F4E.6010203@genome.wustl.edu> Message-ID: <56EC32DB.1000108@genome.wustl.edu> Fri Mar 18 11:50:43 CDT 2016: mmcesop: /vol/system/ found but is not on a GPFS filesystem On 3/18/16 11:39 AM, Matt Weil wrote: > upgrading to 4.2.2 fixed the dependency issue. I now get Unable to > access CES shared root. > > # /usr/lpp/mmfs/bin/mmlsconfig | grep 'cesSharedRoot' > cesSharedRoot /vol/system > > On 3/16/16 2:51 PM, Simon Thompson (Research Computing - IT Services) wrote: >> Have you got a half updated system maybe? >> >> You cant have: >> libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> I.e. 0.7.3-5 and 0.7.5-4 >> >> I cant check right now, but are ibm shipping libcap-Ng as part of their package? >> >> Simon >> ________________________________________ >> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Matt Weil [mweil at genome.wustl.edu] >> Sent: 16 March 2016 19:37 >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> any help here? >> ~]# yum -d0 -e0 -y install spectrum-scale-object-4.2.0-0 >> Error: Multilib version problems found. This often means that the root >> cause is something else and multilib version checking is just >> pointing out that there is a problem. Eg.: >> >> 1. You have an upgrade for libcap-ng which is missing some >> dependency that another package requires. Yum is trying to >> solve this by installing an older version of libcap-ng of the >> different architecture. If you exclude the bad architecture >> yum will tell you what the root cause is (which package >> requires what). You can try redoing the upgrade with >> --exclude libcap-ng.otherarch ... this should give you an error >> message showing the root cause of the problem. >> >> 2. You have multiple architectures of libcap-ng installed, but >> yum can only see an upgrade for one of those architectures. >> If you don't want/need both architectures anymore then you >> can remove the one with the missing update and everything >> will work. >> >> 3. You have duplicate versions of libcap-ng installed already. >> You can use "yum check" to get yum show these errors. >> >> ...you can also use --setopt=protected_multilib=false to remove >> this checking, however this is almost never the correct thing to >> do as something else is very likely to go wrong (often causing >> much more problems). >> >> Protected multilib versions: libcap-ng-0.7.3-5.el7.i686 != libcap-ng-0.7.5-4.el7.x86_64 >> >> >> On 3/16/16 12:40 PM, Oesterlin, Robert wrote: >> My first suggestion is: Don?t deploy the CES nodes manually ? way to many package dependencies. Get those setup right and the installer does a good job. >> >> If you go through and define your cluster nodes to the installer, you can do a GPFS upgrade that way. I?ve run into some issues, especially with clone OS versions of RedHat. (ie, CentOS) It doesn?t give you a whole lot of control over what it does ? give it a ty and it may work well for you. But run it in a test cluster first or on a limited set of nodes. >> >> Bob Oesterlin >> Sr Storage Engineer, Nuance HPC Grid >> 507-269-0413 >> >> >> From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Matt Weil > >> Reply-To: gpfsug main discussion list > >> Date: Wednesday, March 16, 2016 at 12:36 PM >> To: "gpfsug-discuss at spectrumscale.org" > >> Subject: Re: [gpfsug-discuss] 4.2 installer >> >> We have multiple clusters with thousands of nsd's surely there is an >> upgrade path. Are you all saying just continue to manually update nsd >> servers and manage them as we did previously. Is the installer not >> needed if there are current setups. Just deploy CES manually? >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From martin.gasthuber at desy.de Tue Mar 22 09:45:30 2016 From: martin.gasthuber at desy.de (Martin Gasthuber) Date: Tue, 22 Mar 2016 10:45:30 +0100 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server Message-ID: Hi, we're looking for a powerful (and cost efficient) machine config to optimally support the new CES services, especially Ganesha. In more detail, we're wondering if somebody has already got some experience running these services on machines with HAWC and/or LROC enabled HW, resulting in a clearer understanding of the benefits of that config. We will have ~300 client boxes accessing GPFS via NFS and planning for 2 nodes initially. best regards, Martin From S.J.Thompson at bham.ac.uk Tue Mar 22 10:05:05 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 22 Mar 2016 10:05:05 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Tue Mar 22 12:44:57 2016 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Tue, 22 Mar 2016 12:44:57 +0000 Subject: [gpfsug-discuss] HAWC/LROC in Ganesha server In-Reply-To: References: Message-ID: <4eec1651b22f40418104a5a44f424b8d@mbxtoa1.winmail.deshaw.com> It's worth sharing that we have seen two problems with CES providing NFS via ganesha in a similar deployment: 1. multicluster cache invalidation: ganesha's FSAL upcall for invalidation of its file descriptor cache by GPFS doesn't appear to work for remote GPFS filesystems. As mentioned by Simon, this is unsupported, though the problem can be worked around with some effort though by disabling ganesha's FD cache entirely. 2. Readdir bad cookie bug: an interaction we're still providing info to IBM about between certain linux NFS clients and ganesha in which readdir calls may sporadically return empty results for directories containing files, without any corresponding error result code. Given our multicluster requirements and the problems associated with the readdir bug, we've reverted to using CNFS for now. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, March 22, 2016 6:05 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC/LROC in Ganesha server Hi Martin, We have LROC enabled on our CES protocol nodes for SMB: # mmdiag --lroc === mmdiag: lroc === LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016 Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB) objects failed to store 0 failed to recall 94 failed to inval 0 objects queried 0 (0 MB) not found 0 = 0.00 % objects invalidated 338719563 (3114191 MB) Inode objects stored 336876572 (1315923 MB) recalled 336884262 (1315948 MB) = 100.00 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469 (1316052 MB) Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB) Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0 Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) = 0.89 % Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405 MB) Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0 agent inserts=389305528, reads=337261110 response times (usec): insert min/max/avg=1/47705/11 read min/max/avg=1/3145728/54 ssd writeIOs=5906506, writePages=756033024 readIOs=44692016, readPages=44692610 response times (usec): write min/max/avg=3072/1117534/3253 read min/max/avg=56/3145728/364 So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass. We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us). On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment. We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes. Simon On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber" > wrote: >Hi, > > we're looking for a powerful (and cost efficient) machine config to >optimally support the new CES services, especially Ganesha. In more >detail, we're wondering if somebody has already got some experience >running these services on machines with HAWC and/or LROC enabled HW, >resulting in a clearer understanding of the benefits of that config. We >will have ~300 client boxes accessing GPFS via NFS and planning for 2 >nodes initially. > >best regards, > Martin > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From secretary at gpfsug.org Wed Mar 23 11:31:45 2016 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 23 Mar 2016 11:31:45 +0000 Subject: [gpfsug-discuss] Places are filling up fast! Message-ID: <50eb8657d660d1c8d7714a14b6d69864@webmail.gpfsug.org> Dear members, We've had a fantastic response to the registrations for the next meeting in May. So good in fact that there are only 22 spaces left! If you are thinking of attending I would recommend doing so as soon as you can to avoid missing out. The link to register is: http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 [1] Also, we really like to hear from members on their experiences and are looking for volunteers for a short 15-20 minute presentation on their Spectrum Scale/GPFS installation, the highs and lows of it! If you're interested, please let Simon (chair at spectrumscaleug.org) or I know. Thanks and we look forward to seeing you in May. Claire -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org Links: ------ [1] http://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-spring-2016-tickets-21724951916 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.finnerman at load.se Tue Mar 29 23:04:26 2016 From: jan.finnerman at load.se (Jan Finnerman Load) Date: Tue, 29 Mar 2016 22:04:26 +0000 Subject: [gpfsug-discuss] Joined GPFS alias Message-ID: Hi All, I just joined the alias and want to give this short introduction of myself in GPFS terms. I work as a consultant at Load System, an IBM Business Partner based in Sweden. We work mainly in the Media and Finance markets. I support and do installs of GPFS at two customers in the media market in Sweden. Currently, I?m involved in a new customer install with Spectrum Scale 4.2/Red Hat 7.1/PowerKVM/Power 8. This is a customer in south of Sweden that do scientific research in Physics on Elementary Particles. My office location is Kista outside of Stockholm in Sweden. Brgds ///Jan [cid:7674672D-7E3F-417F-96F9-89737A1F6AEE] Jan Finnerman Senior Technical consultant [CertTiv_sm] [cid:4D49557E-099B-4799-AD7E-0A103EB45735] Kista Science Tower 164 51 Kista Mobil: +46 (0)70 631 66 26 Kontor: +46 (0)8 633 66 00/26 jan.finnerman at load.se -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png Type: image/png Size: 5565 bytes Desc: F1EE9474-7BCC-41E6-8237-D949E9DC35D3[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png Type: image/png Size: 8584 bytes Desc: E895055E-B11B-47C3-BA29-E12D29D394FA[9].png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CertPowerSystems_sm[1][9].png Type: image/png Size: 6664 bytes Desc: CertPowerSystems_sm[1][9].png URL: